Resumen del rol

Principal Site Reliability Engineer (SRE)

Requisitos y responsabilidades

Contenido del rol extraído en secciones para revisar más rápido.

Responsibilities

  • Serve as the primary technical owner for production reliability across U.S. customer environments.
  • Investigate and resolve complex issues spanning web applications, APIs, backend services, data pipelines, cloud infrastructure, and customer integrations.
  • Lead production incident response efforts, coordinating cross-functional teams to restore service and minimize customer impact.
  • Perform root cause analysis and drive corrective actions that improve long-term system stability and resilience.
  • Partner with software engineering and platform teams to identify recurring reliability risks and implement sustainable solutions.
  • Design, configure, and validate secure customer connectivity solutions including Site-to-Site VPNs, Transit Gateway integrations, routing configurations, and secure network paths.
  • Support customer onboarding initiatives by troubleshooting connectivity challenges and ensuring consistent implementation processes.
  • Enhance platform observability through improvements in monitoring, logging, alerting, tracing, and operational dashboards.
  • Contribute to CI/CD, infrastructure automation, and deployment processes that improve release safety and operational consistency.
  • Develop operational tooling that supports incident response, troubleshooting, onboarding, and system monitoring activities.
  • Collaborate with engineering leadership to improve cloud architecture, scalability, security, and operational readiness.
  • Partner with customer-facing teams to communicate technical issues, remediation plans, and reliability improvements in a clear and effective manner.
  • Support compliance, security, and risk management initiatives within highly regulated healthcare environments.

Requirements

  • 6+ years of hands-on experience supporting and managing AWS-based production environments.
  • 4+ years of experience supporting web applications and backend services (Python/Django experience strongly preferred).
  • Experience with AWS networking technologies including VPCs, Site-to-Site VPNs, Transit Gateways, routing, NAT gateways, and security groups.
  • Strong experience with Terraform and infrastructure-as-code deployment practices.
  • Experience with containerized environments including ECS, Fargate, Kubernetes, or similar technologies.
  • Experience building and supporting CI/CD pipelines and release automation processes.
  • Familiarity with monitoring and observability platforms such as Datadog, CloudWatch, Sentry, Grafana, or similar tools.
  • Experience leading production incidents, outage management, and root cause analysis initiatives.
  • Exposure to Windows Server environments, Active Directory, Kerberos, and enterprise infrastructure concepts is preferred.
  • Healthcare technology, healthcare SaaS, clinical software, or other regulated industry experience is highly preferred.
  • Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related technical field preferred.

Benefits

  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Paid Time Off (Vacation, Sick & Public Holidays)
Roles similares

Mantén una lista de respaldo.

Ver stack
FocoSite Reliability EngineeringÁrea del rol
Señal de senioritySeniorNivel del candidato
StackAWS, CI/CD, KubernetesSkills principales
Ubicación1 país aceptadoElegibilidad

Stack

Usa estas tags para comparar roles remotos similares.

Elegibilidad de ubicación

Candidatos deberían aplicar solo cuando el país del perfil aparece aquí.

Tu perfilPaís no definidoInicia sesión para comparar tu país con este rol.

Flujo de contratación

WithMira muestra el rol y luego envía candidatos a la aplicación de la empresa.

1Revisa fit del rol, stack y elegibilidad de ubicación en WithMira.
2Abre la página de aplicación de la empresa desde el link rastreado.
3Guarda el rol o suscríbete a oportunidades similares antes de salir.
Aplicar en el sitio de la empresaSitio de la empresaAbrir link