Resumen del rol

Network Reliability Engineer

Requisitos y responsabilidades

Contenido del rol extraído en secciones para revisar más rápido.

YOUR DAILY ROUTINE

  • Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents- Troubleshoot high-impact production issues in collaboration with other engineering teams
  • Participate in an on-call rotation to handle incidents and ensure service continuity
  • Implement and maintain observability solutions to monitor AI infrastructure and application health
  • Contribute to AI infrastructure lifecycle management across different environments and countries
  • Promote and apply best practices in terms of stability, resiliency, scalability, and security
  • Maintain clear technical documentation for tools and procedures
  • Contribute to system and tool evolution based on production feedback
  • Collaborate closely with development teams to ensure infrastructure readiness- Participate in team rituals and knowledge-sharing initiatives

ABOUT YOU

  • Proactive and solution-oriented mindset
  • Passion for automation and continuous improvement
  • Strong collaboration and communication skills
  • Ability to work independently and in a team
  • Willingness to mentor and share knowledge

💻 HARDSKILLS :

  • Experience with Go or Python
  • Strong scripting skills (Bash, Python)
  • Hands-on experience with Linux systems (Ubuntu/Debian)
  • Preferred hands-on experience with GPU & HPC infrastructure
  • Knowledge of networking (VLAN/LAN, TCP/IP, DNS, BGP, load-balancing, IPv6, etc.)
  • Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.)
  • Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX, etc.)
  • Experience managing relational databases (MariaDB)
  • Understanding of CI/CD pipelines (GitLab)
  • Comfortable with English (written and spoken)
Roles similares

Mantén una lista de respaldo.

Ver stack
FocoNetwork Reliability EngineerÁrea del rol
Señal de seniorityMiddleNivel del candidato
StackCI/CD, PythonSkills principales
Ubicación1 país aceptadoElegibilidad

Stack

Usa estas tags para comparar roles remotos similares.

Elegibilidad de ubicación

Candidatos deberían aplicar solo cuando el país del perfil aparece aquí.

Tu perfilPaís no definidoInicia sesión para comparar tu país con este rol.

Flujo de contratación

WithMira muestra el rol y luego envía candidatos a la aplicación de la empresa.

1Revisa fit del rol, stack y elegibilidad de ubicación en WithMira.
2Abre la página de aplicación de la empresa desde el link rastreado.
3Guarda el rol o suscríbete a oportunidades similares antes de salir.
Aplicar en el sitio de la empresaSitio de la empresaAbrir link