Senior Site Reliability Engineer

VDart, Inc. Logo
  • Healthcare
  • Applications have closed

Job Title: Senior Site Reliability Engineer / DevOps Engineer

Location:Bothell, WA

Duration:Contract

Term:6 months

Job Description:

Experience Desired: 7 Years.

Key Responsibilities

Platform Reliability & Operations

  • Own reliability, availability, scalability, and performance of API Gateway services running on Kubernetes
  • Design and implement SRE best practices including SLIs, SLOs, SLAs, error budgets, and incident management
  • Lead production readiness reviews, root cause analysis (RCA), and post-incident improvements
  • Drive capacity planning, performance tuning, and resilience testing
  • Kubernetes & Cloud Engineering
  • Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)
  • Develop and maintain Helm charts, manifests, and deployment strategies
  • Implement rollout strategies such as blue-green, canary, and rolling deployments
  • Collaborate with development teams to ensure cloud-native design patterns
  • Observability & Monitoring (Strong Focus)
  • Build and maintain enterprise-grade observability (O11y) solutions:
  • Prometheus & Grafana for metrics and dashboards
  • Splunk for centralized logging and alerting
  • OpenTelemetry for distributed tracing
  • Define actionable alerts and dashboards for platform and application health
  • Improve MTTR through better visibility and automation
  • CI/CD & Automation
  • Design and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.)
  • Automate infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
  • Develop automation scripts using Python, Bash, or Groovy
  • Security & Compliance
  • Implement DevSecOps practices including secrets management, image scanning, and RBAC
  • Work closely with security teams on vulnerability remediation and compliance controls
  • Innovation & POCs
  • Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiatives
  • Evaluate and prototype integrations with AI/ML-driven routing, observability, and security features
  • Stay current with emerging SRE, cloud, and AI gateway technologies

Soft Skills

  • Strong troubleshooting and problem-solving skills
  • Ability to work cross-functionally with developers, architects, and security teams
  • Proactive mindset with a passion for automation and reliability
  • Good documentation and communication skills

Key Skills:

SRE, Devops, Java, Kubernetes, Observability