Site Reliability Engineer
Job Title : Site Reliability Engineer (SRE)
Location : Columbia, MD or Chicago, IL (Hybrid Preferred – 4 days onsite, with flexibility)
Type : Contract-to-Hire (6-12 months with potential for conversion)
About the Role
Join our dynamic Platform Engineering team as a Site Reliability Engineer (SRE). You ll be responsible for ensuring the reliability, scalability, and performance of our systems, working in a fast-paced and collaborative environment. The role is open to both senior engineers (5+ years of experience) and junior engineers (3+ years of experience) looking to grow their skill set.
Key Responsibilities
- Design, build, and maintain scalable, reliable infrastructure and services.
- Implement monitoring, alerting, and incident response systems to ensure high availability.
- Automate repetitive tasks to reduce manual toil and improve system efficiency.
- Collaborate with development and DevOps teams to enhance application reliability.
- Conduct root cause analysis and post-mortems for production incidents.
- Define and track SLOs, SLIs, and error budgets to measure system health.
- Participate in on-call rotations and handle incidents promptly.
- Continuously enhance system performance, reliability, and cost-efficiency.
- Maintain code quality using SonarQube.
- Support CI/CD pipelines, with a focus on Harness (training provided).
Required Qualifications
- Bachelor s degree in computer science, Engineering, or related field or equivalent experience.
- 3+ years (junior) / 5+ years (senior) experience in SRE, DevOps, or Systems Engineering.
- Strong expertise in SonarQube.
- Proficiency in at least one programming language (Python, Go, Java, etc.).
- Hands-on experience with cloud platforms (AWS, Google Cloud Platform, or Azure).
- Solid Linux systems and networking knowledge.
- Experience with containerization (Docker, Kubernetes).
- Familiarity with CI/CD tools and infrastructure-as-code (Terraform, Ansible).
- Experience with monitoring tools (Prometheus, Grafana, Datadog).