Sleek

Senior Site Reliability Engineer (SRE)

Vaga remota de Site Reliability Engineering com fit claro de localização do candidato.

Publicada5 de jul. de 2026

Países elegíveis19 países aceitos

Sinal de senioridadeSenior

Modelo de trabalhoRemoto

Locais aceitos para candidatos

AustráliaBangladeshCambojaChinaHong Kong, RAE da ChinaÍndia+13 mais

AWS Azure CI/CD GCP Kubernetes Node.js Python

Posso mesmo aplicar?Confira a lista de países

Países aceitos para candidatos estão listados (19).

Atualidade da fonte5 de jul. de 2026

Fit de localização19 países aceitos

Match de stackAWS, Azure

Caminho de aplicaçãoSite da empresa

Resumo de fit da MiraPor que vale revisar esta vaga

Fit de localização19 países aceitosAdicione seu país

Match de stackAdicione skills ao perfil para compararAWS, Azure

Sinal de senioridadeSeniorDefina seu nível para uma análise mais precisa.

Prontidão para aplicarSite da empresaA aplicação continua no site da empresa.

Aplicação

Aplicar no site da empresa

Aplicação externa

Aplicando paraSenior Site Reliability Engineer (SRE)Sleek

Fit de país19 países aceitos

Caminho de aplicaçãoSite da empresa

WithMiraSalve ou assine antes de sair

Aplicação da empresa

O WithMira mantém esta vaga para descoberta. A aplicação continua no site da empresa.

Aplicar no site da empresa

Salvar vaga

Resumo da vaga

Senior Site Reliability Engineer (SRE)

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

You will ensure:

High-quality, secure, and scalable infrastructure capable of supporting modern applications and advanced AI workloads
Robust automation across CI/CD, infrastructure provisioning, and operations to increase reliability and reduce manual overhead
Thoughtful and pragmatic integration of AI into operational workflows to improve efficiency, detect anomalies, and accelerate delivery
Reliable systems engineering practices, including monitoring, incident response, performance tuning, and capacity planning
Strong DevOps standards, including reproducibility, testing, documentation, and operational excellence
Clear technical communication and cross-team alignment to enable predictable delivery and collaborative problem-solving
Mentorship and technical leadership that elevates platform engineering, DevOps maturity, and overall engineering quality across the organisation

Outcomes:

Conduct a full review of Sleek’s cloud infrastructure and propose a roadmap for reliability and scalability improvements
Lead upgrades or redesigns of core platform components such as networking, containers, orchestration, or databases
Improve incident response processes, SLIs, SLOs, and on-call readiness.

Outcomes:

Ensure platform and infrastructure are capable of supporting AI-powered features
Build or refine pipelines for model hosting, embeddings, vector search, or related AI services if required
Implement monitoring and guardrails for AI service performance, cost, and stability

Increase Engineering Velocity Through Automation

Enhance CI/CD pipelines for speed, safety, and reliability
Introduce infrastructure automation, testing automation, and deployment tooling to reduce manual steps
Champion modern DevOps and AI-assisted tooling to improve engineering productivity.

Improve Observability and Operational Excellence

Strengthen logging, monitoring, tracing, and alerting across services
Reduce noisy alerts and improve the signal-to-noise ratio for incidents
Implement readiness checks, runbooks, and automated recovery paths for critical services

Security and Compliance Improvements

Ensure secure configuration, secrets management, access control, and identity management
Implement automated security scanning, dependency monitoring, and hardened pipeline practices
Prepare platform-level requirements needed for reliable and secure AI usage

To do this you will have:

6+ years of progressive experience in Site Reliability Engineering (SRE)
6+ years of strong, hands-on experience across multi-cloud environments such as AWS, GCP, Azure including expertise in networking, compute, storage, security, and cost optimization.
6+ years of deep expertise in containerization and orchestration (e.g., Kubernetes, EKS, ECS)
6+ years of extensive experience with Infrastructure as Code (IaC) (e.g., Terraform, Pulumi, CloudFormation).
System Reliability: Proven ability to design, build, and operate highly reliable, scalable production systems utilizing advanced Zero-Downtime Deployment Patterns (e.g., Blue/Green, Canary, progressive delivery).
Modern Delivery & Tooling: Expertise in modernizing deployments via GitOps practices (e.g., ArgoCD, Flux) and building Self-Service Developer Platforms that enable engineering efficiency (e.g., environment automation, internal tooling).
Networking & Edge Routing: Experience implementing and managing Multi-Cloud API Gateways and Edge Routing solutions (e.g., Kong, Traefik, Cloudflare, multi-cluster ingress).
Security & Hardening: Strong background in platform security, including secrets management, Identity and Access Control (IAM), and Runtime/Security Hardening with tools like Falco/eBPF and WAFs.
Observability: Solid understanding and practical experience with modern observability stacks (e.g., Prometheus, OpenTelemetry, OpenSearch, ELK, CloudWatch).
AI/ML Infrastructure: Experience supporting or deploying AI/ML workloads (e.g., model inference, vector databases, GPU workloads), or strong familiarity with the infrastructure requirements for these systems.
Communication: Excellent communication and collaboration skills with a proven ability to describe complex infrastructure decisions clearly and a background in driving improvements in engineering practices.
Development Expertise: Familiarity with modern programming languages like Node.js, NestJS, and Python is highly desirable for extending DevOps capabilities or integrating tooling.

Vagas similares