Site Reliability Engineer

Apolis Logo
  • Environmental
  • FullTime
  • Applications have closed

Job Title : Site Reliability Engineer (SRE)

Location : Columbia, MD or Chicago, IL (Hybrid Preferred – 4 days onsite, with flexibility)

Type : Contract-to-Hire (6-12 months with potential for conversion)

About the Role
Join our dynamic Platform Engineering team as a Site Reliability Engineer (SRE). You ll be responsible for ensuring the reliability, scalability, and performance of our systems, working in a fast-paced and collaborative environment. The role is open to both senior engineers (5+ years of experience) and junior engineers (3+ years of experience) looking to grow their skill set.

Key Responsibilities

  • Design, build, and maintain scalable, reliable infrastructure and services.
  • Implement monitoring, alerting, and incident response systems to ensure high availability.
  • Automate repetitive tasks to reduce manual toil and improve system efficiency.
  • Collaborate with development and DevOps teams to enhance application reliability.
  • Conduct root cause analysis and post-mortems for production incidents.
  • Define and track SLOs, SLIs, and error budgets to measure system health.
  • Participate in on-call rotations and handle incidents promptly.
  • Continuously enhance system performance, reliability, and cost-efficiency.
  • Maintain code quality using SonarQube.
  • Support CI/CD pipelines, with a focus on Harness (training provided).

Required Qualifications

  • Bachelor s degree in computer science, Engineering, or related field or equivalent experience.
  • 3+ years (junior) / 5+ years (senior) experience in SRE, DevOps, or Systems Engineering.
  • Strong expertise in SonarQube.
  • Proficiency in at least one programming language (Python, Go, Java, etc.).
  • Hands-on experience with cloud platforms (AWS, Google Cloud Platform, or Azure).
  • Solid Linux systems and networking knowledge.
  • Experience with containerization (Docker, Kubernetes).
  • Familiarity with CI/CD tools and infrastructure-as-code (Terraform, Ansible).
  • Experience with monitoring tools (Prometheus, Grafana, Datadog).