Site Reliability Engineer

Role overview

Site Reliability Engineer

Requirements and responsibilities

Readable role content extracted into sections for faster review.

You’ll make reliability the default

You’ll design and maintain infrastructure that is highly available, fault-tolerant, and scalable
You’ll proactively identify and eliminate single points of failure before they become incidents

You’ll make reliability the default

You’ll ensure our production systems remain stable, even under increasing scale and load

You’ll own and optimize our cloud environments

You’ll manage and continuously improve workloads across AWS, GCP, or Azure
You’ll use Infrastructure as Code (Terraform) to standardize and scale infrastructure
You’ll optimize resource usage to balance performance and cost

You’ll run and improve Kubernetes in production

You’ll operate and scale Kubernetes clusters (EKS, GKE, etc.) with confidence
You’ll troubleshoot issues quickly and ensure smooth deployments and upgrades
You’ll ensure our containerized workloads perform reliably at scale

You’ll run and improve Kubernetes in production

You’ll implement and refine monitoring systems using tools like Prometheus, Grafana, Datadog, or ELK
You’ll define alerting that is meaningful, not noisy
You’ll respond to incidents, lead root cause analysis, and ensure we learn from every failure

You’ll run and improve Kubernetes in production

You’ll write scripts and build tooling to eliminate repetitive operational work
You’ll continuously improve infrastructure efficiency through automation
You’ll promote a culture where manual work is a temporary state, not the norm

You’ll collaborate to improve the entire system

You’ll work closely with DevOps and engineering teams to solve performance bottlenecks
You’ll contribute to CI/CD improvements and deployment reliability
You’ll help shape reliability best practices across the organization

First 30 days:

You’ve built a strong understanding of our infrastructure, systems, and workflows
You’re contributing to day-to-day operations with support from the team
You’ve started identifying areas for improvement in automation and reliability

By 90 days:

You’re independently managing infrastructure tasks and troubleshooting issues
You’re actively contributing to reliability and scalability improvements
You’ve taken ownership of parts of our infrastructure and are improving them

Who You Are

You’ve spent ~3 years working in SRE, DevOps, or infrastructure engineering, and you’ve seen what breaks at scale
You’re comfortable working in cloud environments like AWS, GCP, or Azure—and you understand how distributed systems behave
You’ve worked hands-on with Kubernetes in production and know how to troubleshoot it when things go wrong
You don’t just fix issues - you ask why they happened and make sure they don’t happen again

Technically, you likely:

Use Terraform (or similar IaC tools) to manage infrastructure
Work confidently with Docker and Kubernetes
Write scripts in Python, Bash, or similar to automate workflows
Understand CI/CD pipelines (Jenkins, GitHub Actions, Bitbucket, etc.)
Have a solid grasp of networking, load balancing, and high-availability design

When it comes to monitoring:

You’ve implemented tools like Prometheus, Grafana, Datadog, or ELK
You know the difference between useful alerts and noise
You focus on signals that actually drive action

What sets you apart:

You take ownership - you don’t wait to be told something is broken
You’re calm under pressure and methodical during incidents
You simplify complexity instead of adding to it
You communicate clearly, even when explaining deeply technical issues
You care about building systems that make other engineers more effective

Nice to Have (but not required)

Experience with RabbitMQ or Redis in production
Familiarity with Ansible or AWX
Exposure to multi-cloud or hybrid environments
Cloud certifications (AWS, GCP) or Linux certifications
Background from ITI (Information Technology Institute)

What the hiring process will look like

Screening Interview – Talent Acquisition
Technical Interview – SRE Lead
Technical Task
Final Interview – SRE Lead & Cloud DevOps Director

Similar roles

Keep a backup shortlist.

Browse stack

AWS, Kubernetes 1 accepted country

Senior Backend Engineer (AdTech)Leap Tools

Jun 19, 2026 WithMira apply

View role

AWS, Kubernetes 1 accepted country

Senior Backend EngineerLeap Tools

Jun 19, 2026 WithMira apply

View role

CI/CD, Python 8 accepted countries

Application Security EngineerMorgan Stanley

Jun 17, 2026 WithMira apply

View role

AWS, Azure 8 accepted countries

Senior DevOps EngineerFionet

Jun 4, 2026 WithMira apply

View role

FocusSite Reliability EngineerRole area

Seniority signalSeniorCandidate level

StackAWS, Azure, CI/CDPrimary skills

Location1 accepted countryEligibility

Stack

Use these tags to compare similar remote roles.

AWS Azure CI/CD Docker GCP Kubernetes Python React

Location eligibility

Candidates should apply only when their profile country is listed here.

Your profileCountry not setSign in to check your country against this role.

Saudi Arabia

Hiring flow

WithMira shows the role, then sends candidates to the company application.

1Check role fit, stack, and location eligibility in WithMira.

2Open the company application page from the tracked apply link.

3Save the role or subscribe for similar opportunities before leaving.

More roles open to Saudi Arabia More AWS roles More Azure roles More CI/CD roles More Docker roles