Confluent
Staff Software Engineer I- SRE
Rol remoto de Engineering con fit claro de ubicación del candidato.
Publicado27 may 2026
Países elegibles1 país aceptado
Señal de seniorityLead
Modelo de trabajoRemoto
Ubicaciones aceptadas para candidatos
India
Resumen del rol
Staff Software Engineer I- SRE
Requisitos y responsabilidades
Contenido del rol extraído en secciones para revisar más rápido.
What You Will Do:
- Proactive Reliability Engineering (~75% of role) · Analyze systemic failure patterns and design improvements that prevent incident recurrence · Define and maintain SLO/SLA frameworks; use error budgets to guide reliability investments · Build tooling and automation to reduce incident response toil and scale team impact · Own Rootly configuration, workflows, and integrations with PagerDuty, Jira, Confluence, and Slack · Analyze reliability data to identify systemic improvements; build dashboards that drive action · Explore AI-assisted approaches to documentation quality and incident analysis · Design scalable reliability standards that reduce reactive workload over time.
- Incident Management Program (~25% of role) · Own standards, practices, and continuous improvement of incident response · Serve as an on-call Incident Commander for production incidents, including acting as escalation IC when incidents exceed a team's management chain · Develop and deliver training programs for engineering teams at all levels · Coach teams through post-mortems and on developing actionable corrective actions
- Customer Root Cause Analysis (CRCA) · Edit and review customer-facing incident documents to ensure quality and clarity · Drive turnaround SLAs while maintaining technical accuracy · Ensure clear explanation of what happened, why, and how we'll prevent recurrence
- Cross-Team Leadership · Partner with engineering leaders to elevate reliability practices · Be the expert who teams proactively engage for guidance
What You Will Bring:
- 10+ years in SRE, incident management, or reliability engineering · Cloud experience with at least one of AWS, GCP, or Azure·
- Deep expertise with incident management tooling (Rootly, PagerDuty, or similar platforms)
- Strong understanding of distributed systems and failure modes at scale—Kafka/event streaming expertise preferred, or demonstrated rapid mastery of complex systems
- Deep experience with observability: metrics, logging, tracing—ability to diagnose complex issues · Kubernetes and container orchestration experience · Understanding of CI/CD pipelines and release processes · Systems thinking: understanding how infrastructure design choices affect failure modes and recovery · Familiarity with SLO/SLA frameworks.
- Track record as a trusted advisor across engineering organizations · Experience driving org-wide process and cultural changes · Strong written communication (design docs, one-pagers, runbooks) · Post-mortem facilitation experience · Experience with async collaboration across time zones
- Large company experience navigating reliability/incident programs at 500+ engineer organizations·
What Gives You an Edge:
- Multi-cloud experience (minimum 2+ of AWS/GCP/Azure).
- Modern CI/CD, GitHub, AI-assisted workflows—you'll have the freedom to build what you need.
Roles similares
Mantén una lista de respaldo.
Stack
Usa estas tags para comparar roles remotos similares.
Elegibilidad de ubicación
Candidatos deberían aplicar solo cuando el país del perfil aparece aquí.
Tu perfilPaís no definidoInicia sesión para comparar tu país con este rol.
Flujo de contratación
WithMira muestra el rol y luego envía candidatos a la aplicación de la empresa.
1Revisa fit del rol, stack y elegibilidad de ubicación en WithMira.
2Abre la página de aplicación de la empresa desde el link rastreado.
3Guarda el rol o suscríbete a oportunidades similares antes de salir.