Principal Infrastructure Performance Engineer

Upgrade Logo
Upgrade
  • Construction
  • FlexTime
  • Applications have closed

What You’ll Do:

  • Build a resilient, secure, and efficient cloud based observability platform.

  • Monitor and troubleshoot platform issues, including finding solutions to reduce known issues.

  • Build and scale the observability infrastructure to meet rapidly increasing demand.

  • Develop and improve operational practices and procedures.

  • Sample projects:

  • Improve database monitoring: develop custom prometheus exporters in Go for use cases that go beyond what is possible with SQL exporter. Create Grafana dashboards and alerts for these new metrics.

  • MCP servers for observability: deploy MCP server to integrate our observability stack with our LLM tools.

What We Look For:

  • 8+ years of relevant production-level experience.
  • Experience with VictoriaMetrics.
  • Experience with Sumologic.
  • Experience with tracing tools (e.g. OpenTelemetry, Honeycomb, Tempo).
  • Experience with profiling tools (e.g. Pyroscope)
  • Knowledge of cloud monitoring, logging and cost management tools.
  • Programming/scripting knowledge (Go, Java, or Python) and understanding of JVM concepts.
  • In-depth knowledge of AWS services, hands-on experience in AWS provisioning using terraform.
  • Experience with containerized applications and Kubernetes / EKS. Creating and updating / maintaining Helm charts.
  • Understanding of microservices architecture and debugging/investigation techniques.
  • Strong understanding of systems, networking and troubleshooting techniques.
  • Experience in automated build pipeline, continuous integration and continuous deployment.
  • Ability to operate in an agile, entrepreneurial start-up environment.
  • Experience with running Linux in production.

Our Tech Stack:

  • Monitoring: VictoriaMetrics, Grafana, Prometheus, OpenTelemetry, Honeycomb, Sumologic.
  • Infrastructure as code: Terraform.
  • CD: GitOps, ArgoCD, ArgoRollouts.
  • CI: Tekton.
  • Scripting: Bash.
  • Programming: Golang (preferred).
  • AWS: EKS, Cloudwatch, S3, DynamodDB, RDS, SNS, SQS, Lambda.

What We Offer You:

  • Competitive salary and stock option plan.
  • 100% paid coverage of medical, dental and vision insurance.
  • Flexible PTO.
  • Learning stipend for personal growth and development.
  • Paid parental leave.
  • Health & wellness initiatives.

#LI-Remote #BI-Remote