Kubernetes/LLM Engineer
Job Title: AI Operations Platform Consultant
Location: Charlotte, NC or Jersey City, NJ (Hybrid – Onsite 3 days/week)
Start Date: January 5, 2026
Contract Length: 6+ months
Pay rate: $65 – $68/hr. on W2
Overview
Seeking an experienced AI Operations Platform Consultant with strong expertise in Kubernetes and Large Language Models (LLMs). This role will support the deployment, optimization, and operations of AI/LLM workloads in production environments.
Key Responsibilities
- Deploy, manage, and troubleshoot containerized, mission-critical services at scale using Kubernetes/OpenShift.
- Deploy, configure, and optimize LLMs using TensorRT-LLM and Triton Inference Server.
- Operate and support MLOps/LLMOps pipelines, ensuring reliable and scalable inference services.
- Set up and maintain monitoring for AI inference services, focusing on performance and availability.
- Troubleshoot LLM deployments in containerized platforms, including monitoring, load balancing, and scaling.
- Manage scalable infrastructure used for hosting and operating LLM workloads.
- Deploy models into production using containerization, microservices, and API-based architectures.
- Utilize Triton Inference Server for configuration, deployment, and performance tuning.
- Apply model optimization techniques such as pruning, quantization, and knowledge distillation.
Must-Have Skills
- Hands-on experience with LLMs
- Strong proficiency with Kubernetes (OpenShift preferred)
- Experience with TensorRT-LLM and Triton Inference Server
- Background in MLOps/LLMOps, model deployment, and production-grade AI operations