Resumo da vaga

Site Reliability and DevOps Engineering Lead

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

People & Team Leadership

  • Lead, mentor, and grow Platform / DevOps engineers
  • Build a high-performing Platform team
  • Drive accountability for platform reliability and delivery outcomes
  • Lead vendors to deliver capabilities in production.

People & Team Leadership

  • Ensure platform capabilities accelerate product delivery, remove bottlenecks.
  • Defines and enforces platform engineering standards and DevOps practices across all teams and vendors
  • Lead capacity planning, performance optimization, and cost efficiency
  • Define operational standards, runbooks, and reliability practices
  • Accountable for platform reliability outcomes at enterprise/product level

Platform Strategy and Leadership

  • Act as technical authority across platform, reliability, and delivery
  • Define platform strategy and roadmap
  • Govern delivery across internal teams and vendors

Platform Reliability Ownership

  • Own SLIs, SLOs, and error budgets
  • Lead resilience engineering, observability, and failure design
  • Drive proactive risk reduction and continuous improvement
  • Own incident management frameworks and continuous improvement

CI/CD and Release Engineering

  • Own end-to-end pipeline architecture and release automation
  • Standardize, secure, and fully automate pipelines
  • Drive continuous integration, delivery, and validation practices

Incident Leadership

  • Lead Sev1 response, escalation, and recovery
  • Own RCA and drive systemic fixes (not point fixes)

Incident Leadership

  • Embed AI into monitoring, risk prediction, and CI/CD optimization
  • Drive automation to reduce operational toil and improve decision-making

Required Skills:

  • Bachelor’s degree in computer science, Engineering, or a related field.
  • 6-10 years of hands-on experience in software operations, DevOps and Site Reliability Engineering, including managing large-scale, mission-critical systems.
  • Clear and confident communication skills with ability to lead teams and collaborate effectively across engineering, product, and architecture teams.
  • Proven track record ensuring high availability and performance in production environments, with expertise in fault-tolerant, distributed system design.
  • Excellent understanding of modern software delivery pipelines and DevOps practices, including CI/CD, configuration management, and version control (Git).
  • Exceptional problem-solving skills, with experience diagnosing complex system issues under pressure and driving them to resolution.
  • Strong proficiency in at least one programming or scripting language (e.g., Python, Bash, or Java) for automation and tool integration.
  • Self-driven and proactive, with a passion for automating manual processes and continuously improving systems to enhance reliability and team productivity.

Proven experience:

  • Releasing into and running mission-critical, high-availability SaaS platforms
  • Technically leading a Platform team and influence stakeholders and vendors.
  • Stakeholder engagement across Product, Architecture, and Operations

Deep expertise in:

  • Site Reliability Engineering (SLI/SLO, error budgets, incident management)
  • DevOps operating models and platform engineering (engineering transformation)
  • CI/CD architecture and release automation
  • Cloud, Systems & Infrastructure (DB2, Oracle, Infinispan, OpenLiberty)
  • Automation-first engineering with proven usage of AI (self-healing, triage)
  • Java application platforms and runtimes (performance tuning, troubleshooting, production operations)

Strong experience with:

  • Cloud platforms (Azure preferred)
  • Distributed systems and fault-tolerant architectures
  • Performance Tuning and Scaling
  • Database optimisation (DB2, Oracle, PostgreSQL)
  • Multi-region / active-active environments
  • Monitoring, logging, tracing frameworks
  • Experience embedding reliability practices into the SDLC

Hands-on with:

  • DB2, Oracle, Infinispan, OpenLiberty, Azure
  • Infrastructure as Code (Terraform or similar)
  • Containerisation and orchestration (Docker/Kubernetes)

Benefits

  • Remote first / work from home culture
  • Flexible vacation to help you rest, recharge, and connect with loved ones
  • Paid leave benefits
  • Health, dental, and vision insurance
  • 401k retirement savings plan
  • Infertility benefits
  • Tuition reimbursement, life insurance, EAP – and more!
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoSite Reliability EngineeringÁrea da vaga
Sinal de senioridadeLeadNível do candidato
StackAzure, CI/CD, DockerSkills principais
Localização1 país aceitoElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link