DraftKings Inc.
Principal Site Reliability Engineer
Remote Site Reliability Engineering role with clear candidate location fit.
PostedJul 3, 2026
Eligible countries1 accepted country
Seniority signalSenior
Work settingRemote
Accepted candidate locations
USA
Role overview
Principal Site Reliability Engineer
Requirements and responsibilities
Readable role content extracted into sections for faster review.
What you'll do
- Define and execute the long-term strategy for our Kubernetes platform across Google Kubernetes Engine, Amazon Elastic Kubernetes Service, RKE2, and on-premise environments, ensuring reliability, scalability, and operational consistency.
- Drive architectural decisions across critical infrastructure, including cluster lifecycle management, networking, identity and access management, observability, autoscaling, capacity planning, and cost optimization.
- Lead large-scale platform initiatives across multiple engineering teams, establishing technical direction, engineering standards, and measurable outcomes that improve platform reliability and developer experience.
- Establish and evolve reliability practices by defining service level objectives, service level indicators, and error budget frameworks that align platform performance with business priorities.
- Build automation-first infrastructure through Infrastructure as Code, GitOps workflows, self-healing systems, and internal platform tooling that improve engineering velocity and reduce operational overhead.
- Champion the responsible adoption of AI-powered engineering capabilities that improve operational efficiency, accelerate incident response, and enhance developer productivity.
- Lead critical platform incidents, drive post-incident improvements, and strengthen platform resilience through automation, capacity planning, and operational excellence.
- Mentor senior engineers, influence technical strategy across the organization, and elevate engineering excellence through architecture reviews, coaching, and technical leadership.
What you'll bring
- A Bachelor's Degree in Computer Science or a related technical field.
- At least 8 years of experience designing, operating, and scaling distributed cloud and on-premise infrastructure, including at least 3 years operating at the Staff, Principal, or equivalent technical leadership level.
- Proven experience leading large-scale infrastructure or platform initiatives that require cross-functional alignment and long-term technical ownership.
- Deep expertise with Kubernetes, including cluster architecture, networking, storage, security, operators, lifecycle management, and large-scale production operations.
- Extensive experience building and operating production infrastructure in AWS and Google Cloud Platform using Infrastructure as Code technologies such as Terraform, Pulumi, or similar tools.
- Strong software development experience in Go, Python, or both, with expertise in GitOps, continuous integration and continuous delivery, observability, distributed systems, Linux, and reliability engineering principles.
- Experience incorporating AI-powered tools into engineering workflows while applying sound judgment around reliability, security, and operational risk.
- Exceptional communication and leadership skills with a proven ability to mentor engineers, influence technical strategy, and drive engineering excellence. Experience working in regulated industries, hybrid cloud environments, contributing to open-source projects, or holding cloud certifications is preferred.
Similar roles
Keep a backup shortlist.
Stack
Use these tags to compare similar remote roles.
Location eligibility
Candidates should apply only when their profile country is listed here.
Your profileCountry not setSign in to check your country against this role.
Hiring flow
WithMira shows the role, then sends candidates to the company application.
1Check role fit, stack, and location eligibility in WithMira.
2Open the company application page from the tracked apply link.
3Save the role or subscribe for similar opportunities before leaving.