Merative

Site Reliability and DevOps Engineering Lead

Remote Site Reliability Engineering role with clear candidate location fit.

PostedJun 20, 2026

Eligible countries1 accepted country

Seniority signalLead

Work settingRemote

Accepted candidate locations

USA

Azure CI/CD Docker Java Kubernetes PostgreSQL Python REST

Role overview

Site Reliability and DevOps Engineering Lead

Requirements and responsibilities

Readable role content extracted into sections for faster review.

People & Team Leadership

Lead, mentor, and grow Platform / DevOps engineers
Build a high-performing Platform team
Drive accountability for platform reliability and delivery outcomes
Lead vendors to deliver capabilities in production.

People & Team Leadership

Ensure platform capabilities accelerate product delivery, remove bottlenecks.
Defines and enforces platform engineering standards and DevOps practices across all teams and vendors
Lead capacity planning, performance optimization, and cost efficiency
Define operational standards, runbooks, and reliability practices
Accountable for platform reliability outcomes at enterprise/product level

Platform Strategy and Leadership

Act as technical authority across platform, reliability, and delivery
Define platform strategy and roadmap
Govern delivery across internal teams and vendors

Platform Reliability Ownership

Own SLIs, SLOs, and error budgets
Lead resilience engineering, observability, and failure design
Drive proactive risk reduction and continuous improvement
Own incident management frameworks and continuous improvement

CI/CD and Release Engineering

Own end-to-end pipeline architecture and release automation
Standardize, secure, and fully automate pipelines
Drive continuous integration, delivery, and validation practices

Incident Leadership

Lead Sev1 response, escalation, and recovery
Own RCA and drive systemic fixes (not point fixes)

Incident Leadership

Embed AI into monitoring, risk prediction, and CI/CD optimization
Drive automation to reduce operational toil and improve decision-making

Required Skills:

Bachelor’s degree in computer science, Engineering, or a related field.
6-10 years of hands-on experience in software operations, DevOps and Site Reliability Engineering, including managing large-scale, mission-critical systems.
Clear and confident communication skills with ability to lead teams and collaborate effectively across engineering, product, and architecture teams.
Proven track record ensuring high availability and performance in production environments, with expertise in fault-tolerant, distributed system design.
Excellent understanding of modern software delivery pipelines and DevOps practices, including CI/CD, configuration management, and version control (Git).
Exceptional problem-solving skills, with experience diagnosing complex system issues under pressure and driving them to resolution.
Strong proficiency in at least one programming or scripting language (e.g., Python, Bash, or Java) for automation and tool integration.
Self-driven and proactive, with a passion for automating manual processes and continuously improving systems to enhance reliability and team productivity.

Proven experience:

Releasing into and running mission-critical, high-availability SaaS platforms
Technically leading a Platform team and influence stakeholders and vendors.
Stakeholder engagement across Product, Architecture, and Operations

Deep expertise in:

Site Reliability Engineering (SLI/SLO, error budgets, incident management)
DevOps operating models and platform engineering (engineering transformation)
CI/CD architecture and release automation
Cloud, Systems & Infrastructure (DB2, Oracle, Infinispan, OpenLiberty)
Automation-first engineering with proven usage of AI (self-healing, triage)
Java application platforms and runtimes (performance tuning, troubleshooting, production operations)

Strong experience with:

Cloud platforms (Azure preferred)
Distributed systems and fault-tolerant architectures
Performance Tuning and Scaling
Database optimisation (DB2, Oracle, PostgreSQL)
Multi-region / active-active environments
Monitoring, logging, tracing frameworks
Experience embedding reliability practices into the SDLC

Hands-on with:

DB2, Oracle, Infinispan, OpenLiberty, Azure
Infrastructure as Code (Terraform or similar)
Containerisation and orchestration (Docker/Kubernetes)

Benefits

Remote first / work from home culture
Flexible vacation to help you rest, recharge, and connect with loved ones
Paid leave benefits
Health, dental, and vision insurance
401k retirement savings plan
Infertility benefits
Tuition reimbursement, life insurance, EAP – and more!

Similar roles

Keep a backup shortlist.

Kubernetes, PostgreSQL 1 accepted country

Senior Backend Engineer (AdTech)Leap Tools

Jun 19, 2026 WithMira apply

Kubernetes, PostgreSQL 1 accepted country

Senior Backend EngineerLeap Tools

Jun 19, 2026 WithMira apply

CI/CD, Java 8 accepted countries

Application Security EngineerMorgan Stanley

Jun 17, 2026 WithMira apply

Java, Python USA

Application Security Engineer (Tech Lead)Morgan Stanley

Jun 17, 2026 WithMira apply

FocusSite Reliability EngineeringRole area

Seniority signalLeadCandidate level

StackAzure, CI/CD, DockerPrimary skills

Location1 accepted countryEligibility

Stack

Use these tags to compare similar remote roles.

Azure CI/CD Docker Java Kubernetes PostgreSQL Python REST

Location eligibility

Candidates should apply only when their profile country is listed here.

Your profileCountry not setSign in to check your country against this role.

Hiring flow

WithMira shows the role, then sends candidates to the company application.

1Check role fit, stack, and location eligibility in WithMira.

2Open the company application page from the tracked apply link.

3Save the role or subscribe for similar opportunities before leaving.

More roles open to USA More Azure roles More CI/CD roles More Docker roles More Java roles

Apply on company siteCompany siteOpen link