Resumo da vaga

Site Reliability Engineer (West Coast)

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

Details

  • Ensure the reliability of our critical products and services by meeting or exceeding SRE objectives.
  • Instantiate and maintain production infrastructure using Infrastructure as Code and Configuration Management tools.
  • Build and maintain proper monitoring of our services by utilizing centralized logging and time series databases.
  • Automate deployments, administration, and monitoring of our services by following CI/CD practices.
  • Work with engineering and information security teams to enhance, document, establish processes and generally improve the operability and security of our services.
  • Participation in team on-call rotation is required.
  • Additional tasks associated with this position may be assigned in response to company initiatives and business needs.

Education:

  • Bachelor's degree in information systems, computer science, technology, or a related field is strongly preferred. In lieu of degree, 2+ years of relevant and/or equivalent experience is acceptable.

Experience:

  • Minimum of 3+ years of software and/or operational experience in building and maintaining internet-facing production environments is required.
  • Strong experience with Linux/Unix systems administration.
  • Knowledge of source control tools (Git preferred).
  • Experience with Configuration Management and Infrastructure as Code tools (Ansible, Puppet, Terraform preferred).
  • Good understanding of container technology (Docker, Kubernetes preferred).
  • Experience with monitoring tools (Prometheus, Grafana, Nagios, or similar.) and alerting systems.
  • Experience with non-cloud infrastructure.
  • Experience running a large-scale 24/7 production environment.
  • Experience with distributed data processing, databases, and large-scale file systems is a plus.

Experience:

  • Strong scripting abilities in Bash and Python.
  • Experience with incident management, troubleshooting, and root cause analysis.
  • Experience in handling postmortems, building incident response plans, and improving incident resolution procedures.
  • Experience running and maintaining real-world build systems (Jenkins, DroneCI, or similar tools)
  • Demonstrable experience with the entire life cycle of software, starting with Systems Architecture, Systems Design, Implementation, Maintenance, and Operation.
  • Programming experience using HTTP Service APIs.
  • Virtualization experience (VMWare, Proxmox, Oracle Linux Virtualization Manager).
  • Network administration experience is a plus.
  • Exposure to Security and Testing frameworks is a plus.
  • Exposure to compliant regulated industries such as Finance, Healthcare, or Government is a plus.
  • Experience with distributed data processing, databases, and large-scale file systems is a plus.
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoSite Reliability EngineeringÁrea da vaga
Sinal de senioridadeMiddleNível do candidato
StackCI/CD, Docker, KubernetesSkills principais
Localização1 país aceitoElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link