Sleek
Senior Site Reliability Engineer (SRE)
Vaga remota de Site Reliability Engineering com fit claro de localização do candidato.
Publicada5 de jul. de 2026
Países elegíveis19 países aceitos
Sinal de senioridadeSenior
Modelo de trabalhoRemoto
Locais aceitos para candidatos
Resumo da vaga
Senior Site Reliability Engineer (SRE)
Requisitos e responsabilidades
Conteúdo da vaga extraído em seções para revisão mais rápida.
You will ensure:
- High-quality, secure, and scalable infrastructure capable of supporting modern applications and advanced AI workloads
- Robust automation across CI/CD, infrastructure provisioning, and operations to increase reliability and reduce manual overhead
- Thoughtful and pragmatic integration of AI into operational workflows to improve efficiency, detect anomalies, and accelerate delivery
- Reliable systems engineering practices, including monitoring, incident response, performance tuning, and capacity planning
- Strong DevOps standards, including reproducibility, testing, documentation, and operational excellence
- Clear technical communication and cross-team alignment to enable predictable delivery and collaborative problem-solving
- Mentorship and technical leadership that elevates platform engineering, DevOps maturity, and overall engineering quality across the organisation
Outcomes:
- Conduct a full review of Sleek’s cloud infrastructure and propose a roadmap for reliability and scalability improvements
- Lead upgrades or redesigns of core platform components such as networking, containers, orchestration, or databases
- Improve incident response processes, SLIs, SLOs, and on-call readiness.
Outcomes:
- Ensure platform and infrastructure are capable of supporting AI-powered features
- Build or refine pipelines for model hosting, embeddings, vector search, or related AI services if required
- Implement monitoring and guardrails for AI service performance, cost, and stability
Increase Engineering Velocity Through Automation
- Enhance CI/CD pipelines for speed, safety, and reliability
- Introduce infrastructure automation, testing automation, and deployment tooling to reduce manual steps
- Champion modern DevOps and AI-assisted tooling to improve engineering productivity.
Improve Observability and Operational Excellence
- Strengthen logging, monitoring, tracing, and alerting across services
- Reduce noisy alerts and improve the signal-to-noise ratio for incidents
- Implement readiness checks, runbooks, and automated recovery paths for critical services
Security and Compliance Improvements
- Ensure secure configuration, secrets management, access control, and identity management
- Implement automated security scanning, dependency monitoring, and hardened pipeline practices
- Prepare platform-level requirements needed for reliable and secure AI usage
To do this you will have:
- 6+ years of progressive experience in Site Reliability Engineering (SRE)
- 6+ years of strong, hands-on experience across multi-cloud environments such as AWS, GCP, Azure including expertise in networking, compute, storage, security, and cost optimization.
- 6+ years of deep expertise in containerization and orchestration (e.g., Kubernetes, EKS, ECS)
- 6+ years of extensive experience with Infrastructure as Code (IaC) (e.g., Terraform, Pulumi, CloudFormation).
- System Reliability: Proven ability to design, build, and operate highly reliable, scalable production systems utilizing advanced Zero-Downtime Deployment Patterns (e.g., Blue/Green, Canary, progressive delivery).
- Modern Delivery & Tooling: Expertise in modernizing deployments via GitOps practices (e.g., ArgoCD, Flux) and building Self-Service Developer Platforms that enable engineering efficiency (e.g., environment automation, internal tooling).
- Networking & Edge Routing: Experience implementing and managing Multi-Cloud API Gateways and Edge Routing solutions (e.g., Kong, Traefik, Cloudflare, multi-cluster ingress).
- Security & Hardening: Strong background in platform security, including secrets management, Identity and Access Control (IAM), and Runtime/Security Hardening with tools like Falco/eBPF and WAFs.
- Observability: Solid understanding and practical experience with modern observability stacks (e.g., Prometheus, OpenTelemetry, OpenSearch, ELK, CloudWatch).
- AI/ML Infrastructure: Experience supporting or deploying AI/ML workloads (e.g., model inference, vector databases, GPU workloads), or strong familiarity with the infrastructure requirements for these systems.
- Communication: Excellent communication and collaboration skills with a proven ability to describe complex infrastructure decisions clearly and a background in driving improvements in engineering practices.
- Development Expertise: Familiarity with modern programming languages like Node.js, NestJS, and Python is highly desirable for extending DevOps capabilities or integrating tooling.
Vagas similares
Mantenha uma lista reserva.
AWS, Kubernetes 13 países aceitos
Senior Backend Engineer (AdTech)Leap ToolsVer vaga AWS, Kubernetes 13 países aceitos
Senior Backend EngineerLeap ToolsVer vaga AWS, Node.js 13 países aceitos
Senior Software EngineerBaltimore BannerVer vaga AWS, Node.js 8 países aceitos
Talent Community| Senior JavaScript Full Stack EngineerHiring teamVer vaga Stack
Use estas tags para comparar vagas remotas similares.
Elegibilidade de localização
Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.
Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.
Fluxo de contratação
O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.
1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.