Resumo da vaga

Engineering Manager, Runtime Fabric

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

Details

  • Recruit, hire, and develop a high-performing team of systems engineers with deep container and Linux expertise.
  • Foster a culture of technical rigor, open-source contribution, and continuous improvement.
  • Provide regular coaching, feedback, and career development support to your direct reports.
  • Partner with engineering leadership to define the long-term vision and roadmap for container runtime and storage infrastructure.
  • Guide the team in extending and hardening containerd, runc, and related OCI ecosystem projects to meet the GPU-specific requirements of production AI inference, including startup performance, GPU device access, and multi-tenant isolation.
  • Oversee the architecture and evolution of the Baseten Delivery Network: the tiered caching and weight delivery system that makes cold starts 2–3x faster and eliminates thundering herd failures during burst scaling events.
  • Drive the expansion of BDN's architecture, currently focused on model weights, to container images, training checkpoints, and deployment artifacts.
  • Provide technical oversight on GPU-aware isolation mechanisms for multi-tenant inference, including secure container runtimes, Linux namespace hardening, and longer-term micro-VM integration.
  • Ensure the team maintains end-to-end ownership of the container startup performance path, from snapshotter initialization through weight delivery to first inference request.
  • Champion the team's contributions back to the open-source containerd ecosystem alongside a team of core maintainers.
  • Act as the primary advocate for Runtime Fabrics across the organization, ensuring upstream and downstream teams have the integration support they need.
  • Collaborate with product and engineering stakeholders to prioritize investments based on business impact and infrastructure reliability.
  • Communicate team progress, technical trade-offs, and architectural decisions clearly to leadership.
  • Proven experience managing and growing engineering teams in a systems, infrastructure, or low-level runtime context.
  • Deep familiarity with the Linux container ecosystem: containerd, runc, OCI Runtime Spec, Linux namespaces, and cgroups, with the ability to engage credibly in code reviews and architectural discussions.
  • Contributions to containerd/containerd, opencontainers/runc, google/gvisor, kata-containers/kata-containers, or closely related open-source projects.
  • Strong systems programming background in Go and/or C/C++.
  • Experience with distributed storage systems, content-addressable storage, or large-scale caching infrastructure.
  • Understanding of how container images are structured, stored, and delivered at scale.
  • Strong written and verbal communication skills, with the ability to influence without authority across teams.
  • Experience with GPU device access in containers: NVIDIA Container Toolkit, CDI (Container Device Interface), or GPU-aware scheduling.
  • Familiarity with lazy-loading snapshotters (stargz, soci, EROFS/Nydus) or peer-to-peer image distribution.
  • Experience with secure container runtimes (gVisor, Sysbox) or micro-VM technologies (Firecracker, Cloud Hypervisor).
  • Understanding of containerd's shim API (v2) and experience building custom shim implementations.
  • Background in multi-tenant infrastructure or security-sensitive serving environments.
  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Fertility and family-building stipend through Carrot
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoRuntime FabricÁrea da vaga
Sinal de senioridadeLeadNível do candidato
StackSparkSkills principais
Localização1 país aceitoElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link