Resumo da vaga

Senior Site Reliability Engineer

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

Infrastructure & Platform Ownership

  • Design, implement, and maintain scalable infrastructure on Google Cloud Platform to support CodeRabbit's growing user base and processing demands
  • Own and operate critical platform services
  • Build and maintain Infrastructure as Code using Terraform to ensure consistent, reproducible, and version-controlled infrastructure deployments

Reliability & Performance Engineering

  • Establish and maintain SLI/SLO frameworks for all critical services, ensuring we meet our reliability commitments to users
  • Implement comprehensive monitoring, alerting, and observability solutions using Datadog and custom instrumentation
  • Conduct thorough incident response, root cause analysis, and post-mortem processes to continuously improve system reliability
  • Optimize application and infrastructure performance to handle millions of pull request analyses with minimal latency
  • Design and implement chaos engineering practices to proactively identify and resolve system weaknesses

Automation & Developer Experience

  • Develop self-service platforms and tooling that empower engineering teams to deploy, monitor, and troubleshoot their services independently
  • Automate operational tasks including scaling, backup/recovery, security patching, and routine maintenance
  • Create and maintain infrastructure APIs and abstractions that simplify complex operations for development teams]

Security & Compliance

  • Integrate security best practices into all infrastructure and platform services
  • Implement and maintain security monitoring, vulnerability scanning, and compliance reporting
  • Design secure network architectures including VPC configuration, firewall rules, and access control systems
  • Establish and maintain disaster recovery procedures and business continuity planning

Required Qualifications:

  • 7+years of hands-on experience in Site Reliability Engineering, Platform Engineering, or DevOps Engineering roles
  • Proven track record of managing production systems at scale, preferably in high-growth technology companies
  • Experience with cloud platforms, particularly AWS or Google Cloud Platform (GCP), including compute, storage, networking, and managed services
  • Strong background in containerization and orchestration platforms (Kubernetes, Docker)

Technical Skills

  • Programming Languages: Proficiency in Node.js and TypeScript for building automation tools, monitoring solutions, and platform services
  • Infrastructure as Code: Advanced experience with Terraform for infrastructure provisioning and management
  • Monitoring & Observability: Hands-on experience with Datadog or similar platforms (Prometheus, Grafana, ELK stack) for observability
  • Cloud Platforms: Comprehensive experience with GCP services including Compute Engine, GKE, Cloud Run, Cloud SQL, Cloud Storage, Load Balancing, and IAM

Systems & Operations

  • Strong Linux/Unix systems skills
  • Experience with network protocols, load balancing, and CDN technologies
  • Knowledge of security principles and best practices for cloud infrastructure
  • Familiarity with CI/CD tools and practices (Jenkins, GitLab CI, GitHub Actions)
  • Understanding of microservices architecture and distributed systems principles

Bonus Points:

  • Experience with AI/ML infrastructure and tools
  • Background in managing high-traffic web applications and API services
  • Experience with disaster recovery planning and execution
  • Familiarity with compliance frameworks (SOC 2, ISO 27001)
  • Contributions to open-source infrastructure or SRE tooling projects
  • Experience with cost optimization and FinOps practices
  • Knowledge of performance testing and capacity planning methodologies

Why Join Our Engineering Culture?

  • CodeRabbit is building the next generation of AI-native developer tooling — starting with code review. We combine large language models with deep software engineering context to help teams ship faster, catch more bugs, and make better architectural decisions at scale.
  • We are a high-ownership engineering culture. That means no passive execution, no waiting for perfect tickets, and no narrowly defined task boundaries. Engineers here find problems before they're assigned, use AI as a core part of how they build, ship with judgment, and own outcomes from proposal to production.
  • Our operating philosophy: bias toward action, ship the smallest necessary coherent slice, validate proportional to risk, watch what happens, and make the system better. AI drafts; humans decide. Speed matters, but so does understanding what you ship.
  • This opportunity will be energizing for people who want real ownership, pace, and high standards. It's uncomfortable for people who prefer slow consensus or heavily managed workflows.
  • If you want to build tools that are changing how software gets written, and be held to the standard that the best engineers thrive under; we'd love to talk.
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoEngineeringÁrea da vaga
Sinal de senioridadeSeniorNível do candidato
StackAWS, CI/CD, DockerSkills principais
Localização2 países aceitosElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link