Kubernetes/LLM Engineer

Job Title: AI Operations Platform Consultant

Location: Charlotte, NC or Jersey City, NJ (Hybrid – Onsite 3 days/week)

Start Date: January 5, 2026

Contract Length: 6+ months
Pay rate: $65 – $68/hr. on W2

Overview

Seeking an experienced AI Operations Platform Consultant with strong expertise in Kubernetes and Large Language Models (LLMs). This role will support the deployment, optimization, and operations of AI/LLM workloads in production environments.

Key Responsibilities

Deploy, manage, and troubleshoot containerized, mission-critical services at scale using Kubernetes/OpenShift.
Deploy, configure, and optimize LLMs using TensorRT-LLM and Triton Inference Server.
Operate and support MLOps/LLMOps pipelines, ensuring reliable and scalable inference services.
Set up and maintain monitoring for AI inference services, focusing on performance and availability.
Troubleshoot LLM deployments in containerized platforms, including monitoring, load balancing, and scaling.
Manage scalable infrastructure used for hosting and operating LLM workloads.
Deploy models into production using containerization, microservices, and API-based architectures.
Utilize Triton Inference Server for configuration, deployment, and performance tuning.
Apply model optimization techniques such as pruning, quantization, and knowledge distillation.

Must-Have Skills

Hands-on experience with LLMs
Strong proficiency with Kubernetes (OpenShift preferred)
Experience with TensorRT-LLM and Triton Inference Server
Background in MLOps/LLMOps, model deployment, and production-grade AI operations

Apply for job