Resumen del rol

GOV Site Reliability Engineer

Requisitos y responsabilidades

Contenido del rol extraído en secciones para revisar más rápido.

Discovery & Documentation

  • Get up to speed on VDC workloads, dependencies, and operational workflows by reading code, docs, and working with SMEs.
  • Write and maintain runbooks, incident guides, and operational documentation.
  • Support knowledge transfer and contribute to onboarding materials for the team.

Reliability & Incident Response

  • Participate in incident response including triage, investigation, mitigation, and postmortems.
  • Help implement and maintain SLIs, SLOs, and error budgets defined by the team.
  • Identify reliability issues during incidents or reviews and propose concrete improvements.
  • Support high availability and fault tolerance work on Azure, including Azure Government.

Observability

  • Close monitoring gaps by implementing instrumentation, alerting, and dashboards based on team standards.
  • Contribute to toil reduction through automation and tooling improvements.
  • Participate in on-call rotations.

Infrastructure & Delivery

  • Work with IaC, CI/CD pipelines, and deployment tooling in compliance-restricted environments.
  • Support testing, canary deployments, and release validation workflows.
  • Implement changes to infrastructure and configuration following established patterns and review processes.

Collaboration

  • Work with engineering, security, compliance, and operations teams to execute on reliability improvements.
  • Communicate clearly about system behavior, risk, and status — in writing and in meetings.
  • Raise blockers and gaps proactively; don't wait for problems to escalate.

Required

  • 3+ years in Software Engineering, with at least 1 year in SRE, Platform Engineering, or DevOps working on cloud-hosted services.
  • Experience with cloud infrastructure on Azure or a comparable cloud provider.
  • Familiarity with regulated or compliance-oriented environments such as government (FedRAMP, CMMC), financial (PCI-DSS), or healthcare (HIPAA). You understand that compliance shapes what you can and can't do operationally.
  • Able to read and understand code well enough to investigate system behavior without always having someone walk you through it.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack).
  • Experience with IaC tools (Terraform, Terragrunt, or Pulumi) and container orchestration (Kubernetes).
  • Experience with CI/CD tooling such as GitHub Actions, Azure DevOps, GitLab CI, or ArgoCD.
  • Strong programming skills in one or more of: TypeScript/JS, Go, Java, C#, or similar.
  • Solid understanding of distributed systems fundamentals and networking basics.
  • Clear written and verbal communication skills.

Preferred

  • Experience in Government or Sovereign Cloud environments (e.g., Azure Government, AWS GovCloud).
  • Background in SaaS platforms or multi-tenant systems.
  • Familiarity with chaos engineering, resilience testing, or load testing.
  • Exposure to building or improving reliability practices on a team.
  • Familiar with AI-first development workflows using LLM-powered tools for automation, code generation, or documentation.

Why Join?

  • Work on a high-impact reliability practice for a growing GOV/Sovereign Cloud platform.
  • Learn from senior engineers while owning real work end-to-end.
  • Collaborate with strong teams across product, cloud engineering, security, and compliance.
  • Professional development resources including mentorship, training, and volunteer days.
  • Competitive compensation and benefits.

What you'll get

  • Unlimited paid time off, 12 paid holidays including 4 global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
  • Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
  • Medical, dental, and vision coverage starting on your first day
  • Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
  • 401(k) retirement plan with company matching contributions
  • Fertility, adoption, and surrogacy support through Maven, plus paid volunteer time
  • AirVet: 24/7 virtual veterinary care at no cost
  • Legal services, identity protection, and supplemental health insurance options
  • Tax-advantaged spending accounts for healthcare, dependent care, and commuting
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning
Roles similares

Mantén una lista de respaldo.

Ver stack
FocoSite Reliability EngineerÁrea del rol
Señal de senioritySeniorNivel del candidato
StackAWS, Azure, CI/CDSkills principales
Ubicación1 país aceptadoElegibilidad

Stack

Usa estas tags para comparar roles remotos similares.

Elegibilidad de ubicación

Candidatos deberían aplicar solo cuando el país del perfil aparece aquí.

Tu perfilPaís no definidoInicia sesión para comparar tu país con este rol.

Flujo de contratación

WithMira muestra el rol y luego envía candidatos a la aplicación de la empresa.

1Revisa fit del rol, stack y elegibilidad de ubicación en WithMira.
2Abre la página de aplicación de la empresa desde el link rastreado.
3Guarda el rol o suscríbete a oportunidades similares antes de salir.
Aplicar en el sitio de la empresaSitio de la empresaAbrir link