Kubernetes/LLM Engineer

Apolis Logo
  • Consulting
  • FullTime
  • Shift

Job Title: AI Operations Platform Consultant

Location: Charlotte, NC or Jersey City, NJ (Hybrid – Onsite 3 days/week)

Start Date: January 5, 2026

Contract Length: 6+ months
Pay rate: $65 – $68/hr. on W2

Overview

Seeking an experienced AI Operations Platform Consultant with strong expertise in Kubernetes and Large Language Models (LLMs). This role will support the deployment, optimization, and operations of AI/LLM workloads in production environments.

Key Responsibilities

  • Deploy, manage, and troubleshoot containerized, mission-critical services at scale using Kubernetes/OpenShift.
  • Deploy, configure, and optimize LLMs using TensorRT-LLM and Triton Inference Server.
  • Operate and support MLOps/LLMOps pipelines, ensuring reliable and scalable inference services.
  • Set up and maintain monitoring for AI inference services, focusing on performance and availability.
  • Troubleshoot LLM deployments in containerized platforms, including monitoring, load balancing, and scaling.
  • Manage scalable infrastructure used for hosting and operating LLM workloads.
  • Deploy models into production using containerization, microservices, and API-based architectures.
  • Utilize Triton Inference Server for configuration, deployment, and performance tuning.
  • Apply model optimization techniques such as pruning, quantization, and knowledge distillation.

Must-Have Skills

  • Hands-on experience with LLMs
  • Strong proficiency with Kubernetes (OpenShift preferred)
  • Experience with TensorRT-LLM and Triton Inference Server
  • Background in MLOps/LLMOps, model deployment, and production-grade AI operations