Role overview

Engineering Manager, Runtime Fabric

Requirements and responsibilities

Readable role content extracted into sections for faster review.

Details

  • Recruit, hire, and develop a high-performing team of systems engineers with deep container and Linux expertise.
  • Foster a culture of technical rigor, open-source contribution, and continuous improvement.
  • Provide regular coaching, feedback, and career development support to your direct reports.
  • Partner with engineering leadership to define the long-term vision and roadmap for container runtime and storage infrastructure.
  • Guide the team in extending and hardening containerd, runc, and related OCI ecosystem projects to meet the GPU-specific requirements of production AI inference, including startup performance, GPU device access, and multi-tenant isolation.
  • Oversee the architecture and evolution of the Baseten Delivery Network: the tiered caching and weight delivery system that makes cold starts 2–3x faster and eliminates thundering herd failures during burst scaling events.
  • Drive the expansion of BDN's architecture, currently focused on model weights, to container images, training checkpoints, and deployment artifacts.
  • Provide technical oversight on GPU-aware isolation mechanisms for multi-tenant inference, including secure container runtimes, Linux namespace hardening, and longer-term micro-VM integration.
  • Ensure the team maintains end-to-end ownership of the container startup performance path, from snapshotter initialization through weight delivery to first inference request.
  • Champion the team's contributions back to the open-source containerd ecosystem alongside a team of core maintainers.
  • Act as the primary advocate for Runtime Fabrics across the organization, ensuring upstream and downstream teams have the integration support they need.
  • Collaborate with product and engineering stakeholders to prioritize investments based on business impact and infrastructure reliability.
  • Communicate team progress, technical trade-offs, and architectural decisions clearly to leadership.
  • Proven experience managing and growing engineering teams in a systems, infrastructure, or low-level runtime context.
  • Deep familiarity with the Linux container ecosystem: containerd, runc, OCI Runtime Spec, Linux namespaces, and cgroups, with the ability to engage credibly in code reviews and architectural discussions.
  • Contributions to containerd/containerd, opencontainers/runc, google/gvisor, kata-containers/kata-containers, or closely related open-source projects.
  • Strong systems programming background in Go and/or C/C++.
  • Experience with distributed storage systems, content-addressable storage, or large-scale caching infrastructure.
  • Understanding of how container images are structured, stored, and delivered at scale.
  • Strong written and verbal communication skills, with the ability to influence without authority across teams.
  • Experience with GPU device access in containers: NVIDIA Container Toolkit, CDI (Container Device Interface), or GPU-aware scheduling.
  • Familiarity with lazy-loading snapshotters (stargz, soci, EROFS/Nydus) or peer-to-peer image distribution.
  • Experience with secure container runtimes (gVisor, Sysbox) or micro-VM technologies (Firecracker, Cloud Hypervisor).
  • Understanding of containerd's shim API (v2) and experience building custom shim implementations.
  • Background in multi-tenant infrastructure or security-sensitive serving environments.
  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Fertility and family-building stipend through Carrot
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Similar roles

Keep a backup shortlist.

Browse stack
FocusRuntime FabricRole area
Seniority signalLeadCandidate level
StackSparkPrimary skills
Location1 accepted countryEligibility

Stack

Use these tags to compare similar remote roles.

Location eligibility

Candidates should apply only when their profile country is listed here.

Your profileCountry not setSign in to check your country against this role.

Hiring flow

WithMira shows the role, then sends candidates to the company application.

1Check role fit, stack, and location eligibility in WithMira.
2Open the company application page from the tracked apply link.
3Save the role or subscribe for similar opportunities before leaving.
Apply on company siteCompany siteOpen link