Merative
Site Reliability and DevOps Engineering Lead
Rol remoto de Site Reliability Engineering con fit claro de ubicación del candidato.
Publicado20 jun 2026
Países elegibles1 país aceptado
Señal de seniorityLead
Modelo de trabajoRemoto
Ubicaciones aceptadas para candidatos
Estados Unidos
Resumen del rol
Site Reliability and DevOps Engineering Lead
Requisitos y responsabilidades
Contenido del rol extraído en secciones para revisar más rápido.
People & Team Leadership
- Lead, mentor, and grow Platform / DevOps engineers
- Build a high-performing Platform team
- Drive accountability for platform reliability and delivery outcomes
- Lead vendors to deliver capabilities in production.
People & Team Leadership
- Ensure platform capabilities accelerate product delivery, remove bottlenecks.
- Defines and enforces platform engineering standards and DevOps practices across all teams and vendors
- Lead capacity planning, performance optimization, and cost efficiency
- Define operational standards, runbooks, and reliability practices
- Accountable for platform reliability outcomes at enterprise/product level
Platform Strategy and Leadership
- Act as technical authority across platform, reliability, and delivery
- Define platform strategy and roadmap
- Govern delivery across internal teams and vendors
Platform Reliability Ownership
- Own SLIs, SLOs, and error budgets
- Lead resilience engineering, observability, and failure design
- Drive proactive risk reduction and continuous improvement
- Own incident management frameworks and continuous improvement
CI/CD and Release Engineering
- Own end-to-end pipeline architecture and release automation
- Standardize, secure, and fully automate pipelines
- Drive continuous integration, delivery, and validation practices
Incident Leadership
- Lead Sev1 response, escalation, and recovery
- Own RCA and drive systemic fixes (not point fixes)
Incident Leadership
- Embed AI into monitoring, risk prediction, and CI/CD optimization
- Drive automation to reduce operational toil and improve decision-making
Required Skills:
- Bachelor’s degree in computer science, Engineering, or a related field.
- 6-10 years of hands-on experience in software operations, DevOps and Site Reliability Engineering, including managing large-scale, mission-critical systems.
- Clear and confident communication skills with ability to lead teams and collaborate effectively across engineering, product, and architecture teams.
- Proven track record ensuring high availability and performance in production environments, with expertise in fault-tolerant, distributed system design.
- Excellent understanding of modern software delivery pipelines and DevOps practices, including CI/CD, configuration management, and version control (Git).
- Exceptional problem-solving skills, with experience diagnosing complex system issues under pressure and driving them to resolution.
- Strong proficiency in at least one programming or scripting language (e.g., Python, Bash, or Java) for automation and tool integration.
- Self-driven and proactive, with a passion for automating manual processes and continuously improving systems to enhance reliability and team productivity.
Proven experience:
- Releasing into and running mission-critical, high-availability SaaS platforms
- Technically leading a Platform team and influence stakeholders and vendors.
- Stakeholder engagement across Product, Architecture, and Operations
Deep expertise in:
- Site Reliability Engineering (SLI/SLO, error budgets, incident management)
- DevOps operating models and platform engineering (engineering transformation)
- CI/CD architecture and release automation
- Cloud, Systems & Infrastructure (DB2, Oracle, Infinispan, OpenLiberty)
- Automation-first engineering with proven usage of AI (self-healing, triage)
- Java application platforms and runtimes (performance tuning, troubleshooting, production operations)
Strong experience with:
- Cloud platforms (Azure preferred)
- Distributed systems and fault-tolerant architectures
- Performance Tuning and Scaling
- Database optimisation (DB2, Oracle, PostgreSQL)
- Multi-region / active-active environments
- Monitoring, logging, tracing frameworks
- Experience embedding reliability practices into the SDLC
Hands-on with:
- DB2, Oracle, Infinispan, OpenLiberty, Azure
- Infrastructure as Code (Terraform or similar)
- Containerisation and orchestration (Docker/Kubernetes)
Benefits
- Remote first / work from home culture
- Flexible vacation to help you rest, recharge, and connect with loved ones
- Paid leave benefits
- Health, dental, and vision insurance
- 401k retirement savings plan
- Infertility benefits
- Tuition reimbursement, life insurance, EAP – and more!
Roles similares
Mantén una lista de respaldo.
Kubernetes, PostgreSQL 1 país aceptado
Senior Backend Engineer (AdTech)Leap ToolsVer rol Kubernetes, PostgreSQL 1 país aceptado
Senior Backend EngineerLeap ToolsVer rol CI/CD, Java 8 países aceptados
Application Security EngineerMorgan StanleyVer rol Java, Python USA
Application Security Engineer (Tech Lead)Morgan StanleyVer rol Stack
Usa estas tags para comparar roles remotos similares.
Elegibilidad de ubicación
Candidatos deberían aplicar solo cuando el país del perfil aparece aquí.
Tu perfilPaís no definidoInicia sesión para comparar tu país con este rol.
Flujo de contratación
WithMira muestra el rol y luego envía candidatos a la aplicación de la empresa.
1Revisa fit del rol, stack y elegibilidad de ubicación en WithMira.
2Abre la página de aplicación de la empresa desde el link rastreado.
3Guarda el rol o suscríbete a oportunidades similares antes de salir.