Site Reliability Engineer

Role: Site Reliability Engineer

Duration: 12+ Months

Location: Raleigh, NC (Onsite, 5 days)

Primary Skill Required for the Role: Azure Infrastructure

Level Required for Primary Skill: Intermediate (3-5 years experience)

Proven expertise in Site Reliability Engineering, with a background in software engineering, infrastructure, or operations.
Hands-on experience with cloud platforms (e.g. Azure), operating systems (e.g. Linux RHEL7+, Windows 2019+), and networking fundamentals.
Solid understanding of networking and storage technologies (e.g. NFS, SAN, NAS).
Strong working knowledge of authentication and naming services (e.g. DNS, LDAP, Kerberos, Centrify).
Proficiency in scripting and automation (e.g., Python, Go, Bash).
Practical experience with infrastructure as code tools (e.g., Terraform, Ansible).
Demonstrated ability to define and manage SLIs, SLOs, SLAs, and to systematically reduce TOIL.
Ability to integrate with observability platforms to ensure system visibility.
A metrics- and automation-driven mindset, with a strong focus on measurable reliability.
Calm under pressure, especially during incidents and outages, with a structured approach to incident response and post-mortems.
Strong collaboration and communication skills, with the ability to work across engineering and business teams.
A proactive, ownership-driven attitude, always seeking opportunities to improve systems and processes.

Apply for job