Resumo da vaga

Network Reliability Engineer

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

YOUR DAILY ROUTINE

  • Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents- Troubleshoot high-impact production issues in collaboration with other engineering teams
  • Participate in an on-call rotation to handle incidents and ensure service continuity
  • Implement and maintain observability solutions to monitor AI infrastructure and application health
  • Contribute to AI infrastructure lifecycle management across different environments and countries
  • Promote and apply best practices in terms of stability, resiliency, scalability, and security
  • Maintain clear technical documentation for tools and procedures
  • Contribute to system and tool evolution based on production feedback
  • Collaborate closely with development teams to ensure infrastructure readiness- Participate in team rituals and knowledge-sharing initiatives

ABOUT YOU

  • Proactive and solution-oriented mindset
  • Passion for automation and continuous improvement
  • Strong collaboration and communication skills
  • Ability to work independently and in a team
  • Willingness to mentor and share knowledge

💻 HARDSKILLS :

  • Experience with Go or Python
  • Strong scripting skills (Bash, Python)
  • Hands-on experience with Linux systems (Ubuntu/Debian)
  • Preferred hands-on experience with GPU & HPC infrastructure
  • Knowledge of networking (VLAN/LAN, TCP/IP, DNS, BGP, load-balancing, IPv6, etc.)
  • Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.)
  • Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX, etc.)
  • Experience managing relational databases (MariaDB)
  • Understanding of CI/CD pipelines (GitLab)
  • Comfortable with English (written and spoken)
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoNetwork Reliability EngineerÁrea da vaga
Sinal de senioridadeMiddleNível do candidato
StackCI/CD, PythonSkills principais
Localização1 país aceitoElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link