Staff Site Reliability Engineer

Lucid Motors Logo
  • Automotive
  • FullTime
  • Applications have closed

At Lucid, we set out to introduce the most captivating, luxury electric vehicles that elevate the human experience and transcend the perceived limitations of space, performance, and intelligence. Vehicles that are intuitive, liberating, and designed for the future of mobility. We plan to lead in this new era of luxury electric by returning to the fundamentals of great design — where every decision we make is in service of the individual and environment. Because when you are no longer bound by convention, you are free to define your own experience. Come work alongside some of the most accomplished minds in the industry. Beyond providing competitive salaries, we’re providing a community for innovators who want to make an immediate and significant impact. If you are driven to create a better, more sustainable future, then this is the right place for you. Responsibilities include owning and enhancing the reliability of services deployed across various cloud regions, proactively monitoring, automating, and scaling services to ensure uptime and performance. Leading containerization and deployment of microservices and data pipelines on Kubernetes using Helm charts. Advocating for a DevOps culture emphasizing automation and engineering excellence. Implementing autoscaling strategies and monitoring performance with tools like Prometheus and Grafana. Performing SRE tasks such as availability monitoring, incident response, post-mortem analysis, and preparing reliability reports. Deploying and maintaining cloud services and tools including Kafka, Spark, Presto, Airflow, MQTT, and others. Managing cloud infrastructure using Infrastructure as Code tools like Terraform and Cluster API. Enhancing automated alerting, incident detection, and recovery mechanisms. Participating in on-call rotation to meet business SLAs and document runbooks. Collaborating in Agile Scrum and Kanban workflows. Performing impact analysis during incidents and implementing preventive measures. Requirements include a B.S. or M.S. degree in Computer Science, Engineering, or related field or equivalent experience; 8+ years in Cloud Infrastructure, SRE, DevOps or related fields; 4+ years hands-on experience with Kubernetes in public and private clouds; 4+ years with Infrastructure-as-Code using Terraform or similar; experience with Python, Go, Bash/Shell; strong understanding of Prometheus, Grafana, and observability tools; ability to diagnose and resolve AWS performance bottlenecks; and experience with configuration management tools like Ansible, Chef, or Puppet preferred. Lucid Motors is an equal opportunity employer committed to diversity and inclusion.