Engineering – SRE Platforms – SRE Logging Engineer – Associate – Dallas

Goldman Sachs & Co. Logo
Goldman Sachs & Co.
  • Engineering
  • FullTime
  • Applications have closed

Job Description

Your Impact

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for the availability and reliability of our firm’s most critical platform services, and ensures they meet the requirements of our internal and external users. We look for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment.

How will you fulfil your potential?

As a SRE Logging Engineer, you will work with customers, product owners, and SREs to design and develop a large-scale application to process, store and read large volume of log events. You will run a production environment spanning AWS, Google Cloud Platform and on-prem datacentres.

Basic Qualifications

  • 3+ years of relevant work experience

  • Proficiency in one or more of the following: Java, Go, Python, JavaScript

  • Excellent programming skills – developing, debugging, testing and optimizing code

  • Experience with algorithms, data structures and software design

  • Experience with distributed systems design, maintenance, and troubleshooting

Preferred Experience

  • Experience with logging solution like Datadog, AWS Cloudwatch, Splunk or Elasticsearch

  • Experience with running workloads in Kubernetes

  • Systems experience in UNIX/Linux and networking, especially in scaling for performance and debugging complex distributed systems

  • Knowledge of cloud native solutions in AWS or Google Cloud Platform

  • Basic understanding of SRE concepts like observability, SLO/SLI, metrics