Resumen del rol

Senior II Site Reliability Engineer

Requisitos y responsabilidades

Contenido del rol extraído en secciones para revisar más rápido.

Partner with the best

  • Owning the SRE infrastructure lifecycle from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management
  • Designing and implementing frameworks that reflect customer experience for load balancing services and driving action when error budgets are at risk
  • Building and maintaining observability pipelines from load-balancing components and system-level sources to dashboards that enable rapid incident triage
  • Leading technical incident response for complex NB/NLB failures, acting as the technical commander and driving root cause analysis and preventive follow-through
  • Developing and automating safe deployment workflows for phased releases, including bake-period monitoring, feature flag management, and validation across global datacenter rollouts
  • Reviewing design documents, product-requirement documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps
  • Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability

To be successful in this role you will:

  • 8+ years of experience in SRE, infrastructure engineering, or platform engineering, working with large-scale distributed systems
  • Demonstrate deep expertise with Linux networking fundamentals and diagnosing at the packet level using tcpdump, netstat, and similar tools
  • Have hands-on experience with L4/L7 load balancing technologies covering configuration, health checking, high availability, and failure modes at scale
  • Show a track record of defining SLO/SLI frameworks, building observability platforms from scratch, and running incident management processes at scale
  • Demonstrate expertise in Kubernetes and containerization at scale including workload scheduling, networking, resource management, and operating stateful or network-intensive workloads in a cluster environment
  • Build automation and tooling using Python or Go, with infrastructure-as-code experience (SaltStack, Ansible, or Terraform) and deployment safety instincts
Roles similares

Mantén una lista de respaldo.

Ver stack
FocoSite Reliability EngineerÁrea del rol
Señal de senioritySeniorNivel del candidato
StackKubernetes, Python, GolangSkills principales
Ubicación2 países aceptadosElegibilidad

Stack

Usa estas tags para comparar roles remotos similares.

Elegibilidad de ubicación

Candidatos deberían aplicar solo cuando el país del perfil aparece aquí.

Tu perfilPaís no definidoInicia sesión para comparar tu país con este rol.

Flujo de contratación

WithMira muestra el rol y luego envía candidatos a la aplicación de la empresa.

1Revisa fit del rol, stack y elegibilidad de ubicación en WithMira.
2Abre la página de aplicación de la empresa desde el link rastreado.
3Guarda el rol o suscríbete a oportunidades similares antes de salir.
Aplicar en el sitio de la empresaSitio de la empresaAbrir link