Resumo da vaga

DevOps Engineer

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

What You'll Do

  • Operate and evolve our Kubernetes platform across multiple clusters and environments (Prod, Dev, hybrid on-prem and public cloud), covering control plane operations, node lifecycle, upgrades, and autoscaling at every layer (Cluster Autoscaler, HPA, KEDA).
  • Architect and manage hybrid cloud infrastructure spanning on-premises and public clouds (GCP, AWS), including workload placement, cross-cloud networking, and unified resource management.
  • Own the CI/CD and GitOps experience end-to-end: container build pipelines, image optimization, and progressive delivery via ArgoCD / FluxCD.
  • Own the observability stack as a single pane of glass across all clusters: Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus -- and help push toward agent-assisted SRE workflows.
  • Manage and improve our inference platform: vLLM serving and AIBrix for multi-model orchestration and autoscaling across a fleet of NVIDIA GPUs.
  • Operate platform services: Kafka, Redis, PostgreSQL, OpenSearch.
  • Manage identity and access via Keycloak integrated with Google Workspace; harden SSO, RBAC, and secrets management across the platform.
  • Harden network security across private load balancers, firewalls, and VPC segmentation; design and maintain hub-and-spoke / multi-AZ topologies.
  • Support training infrastructure: self-service VM provisioning, RunPod burst capacity, Weights and Biases integration.
  • Drive infrastructure reliability, cost efficiency, and capacity planning as the platform scales.

What We're Looking For

  • Kubernetes -- deep, hands-on. Strong production experience with Kubernetes, fluent in workloads and controllers, networking (Services, Ingress, CNI basics), storage (PV/PVC, CSI), RBAC, and the autoscaling story end-to-end (HPA, VPA, Cluster Autoscaler, KEDA). Cloud-managed Kubernetes (GKE, EKS, AKS) is fine; on-premises / self-managed Kubernetes (kubeadm, Cluster API, k3s, etc.) is a strong plus.
  • Networking -- design-level, not just operator-level. You have designed real network topologies at some point in your career -- hub-and-spoke, multi-AZ / multi-VPC, or an equivalent enterprise pattern -- and can defend the tradeoffs. Comfortable with VPCs, firewalls, load balancers, private cluster architecture, DNS, and routing. On-premises networking experience (VLANs, BGP, L2/L3 fabrics, pfSense / Fortinet / Palo Alto / Cisco) is a strong plus.
  • CI/CD and Docker -- concepts over tooling. You can build and optimize Dockerfiles (multi-stage builds, layer caching, small/secure base images) and have owned full CI/CD pipelines end-to-end. Tooling is flexible -- GitHub Actions, GitLab CI, Azure Pipelines, Jenkins, Argo Workflows, etc. -- but you should be able to clearly articulate the full lifecycle of a typical pipeline, and explain how CI/CD changes when the deployment target is Kubernetes (ArgoCD / FluxCD, GitOps patterns, progressive delivery).
  • Observability -- you have built this before. You have stood up a full observability stack from scratch and operated it in production -- metrics, logs, traces, alerting, on-call. Familiarity with the Grafana stack (Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus) is a strong plus. Bonus points if you have experimented with agent-assisted SRE workflows or LLM-driven incident triage.
  • SSO and identity. When you bring a new tool into the platform, your instinct is to wire it into a central IdP rather than leave it on local accounts. Comfortable with OpenID Connect, SAML, and traditional directory services (LDAP / Active Directory), and you have integrated tools with an IdP like Keycloak, Okta, Azure AD, or equivalent.
  • Linux and automation fundamentals. Strong Linux proficiency (RHEL/Ubuntu or equivalent) including basic performance and networking debugging. Comfort with infrastructure-as-code (Terraform / Terragrunt / Pulumi or equivalent) and configuration management.
  • Ownership mindset. Comfortable operating in a high-ownership environment where you make architecture decisions, push them to production, and own the outcomes.
  • Optional but valuable: hands-on experience operating any of Kafka, Redis, PostgreSQL, OpenSearch -- at production scale, including HA, backup/restore, and upgrade planning.

Bonus points for:

  • Experience with OpenStack in production: Nova, Neutron, Cinder, Trove, Horizon, and CLI administration.
  • Experience with KVM virtualization and storage backends like Ceph or Rook-Ceph on Kubernetes.
  • Familiarity with vLLM internals: PagedAttention, continuous batching, tensor parallelism.
  • Background in AI/ML infrastructure or GPU cluster operations at scale.
  • Experience with KEDA or event-driven autoscaling patterns in anger.
  • Prior open-source contributions to Kubernetes, OpenStack, or adjacent projects.
  • Kernel-level Linux debugging and performance tuning.
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoDevOps EngineerÁrea da vaga
Sinal de senioridadeMiddleNível do candidato
StackAWS, Azure, CI/CDSkills principais
Localização2 países aceitosElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link