Akamai Technologies

Senior II Site Reliability Engineer

Remote Site Reliability Engineer role with clear candidate location fit.

PostedJul 4, 2026

Eligible countries2 accepted countries

Seniority signalSenior

Work settingRemote

Accepted candidate locations

CanadaPoland

Can I actually apply?Check country list

Accepted candidate countries are listed (2).

Source freshnessJul 4, 2026

Location fit2 accepted countries

Stack matchKubernetes, Python

Apply pathCompany site

Mira fit summaryWhy this role is worth checking

Location fit2 accepted countriesAdd your country

Stack matchAdd profile skills to compareKubernetes, Python

Seniority signalSeniorSet your level for a sharper check.

Apply readinessCompany siteApplication continues on the company site.

Application

Apply on company site

External apply

Applying forSenior II Site Reliability EngineerAkamai Technologies

Country fit2 accepted countries

Apply pathCompany site

WithMiraSave or subscribe before leaving

Company application

WithMira keeps this role for discovery. The application continues on the company site.

Role overview

Readable role content extracted into sections for faster review.

Owning the SRE infrastructure lifecycle from design reviews and pre-rollout readiness assessments through production sign-off and ongoing reliability management
Designing and implementing frameworks that reflect customer experience for load balancing services and driving action when error budgets are at risk
Building and maintaining observability pipelines from load-balancing components and system-level sources to dashboards that enable rapid incident triage
Leading technical incident response for complex NB/NLB failures, acting as the technical commander and driving root cause analysis and preventive follow-through
Developing and automating safe deployment workflows for phased releases, including bake-period monitoring, feature flag management, and validation across global datacenter rollouts
Reviewing design documents, product-requirement documents and producing actionable SRE input on operational risks, capacity implications, Day-2 concerns, and product strategy gaps
Building automation and tooling using Python or Go that reduces operational toil and improves team-wide operational capability

8+ years of experience in SRE, infrastructure engineering, or platform engineering, working with large-scale distributed systems
Demonstrate deep expertise with Linux networking fundamentals and diagnosing at the packet level using tcpdump, netstat, and similar tools
Have hands-on experience with L4/L7 load balancing technologies covering configuration, health checking, high availability, and failure modes at scale
Show a track record of defining SLO/SLI frameworks, building observability platforms from scratch, and running incident management processes at scale
Demonstrate expertise in Kubernetes and containerization at scale including workload scheduling, networking, resource management, and operating stateful or network-intensive workloads in a cluster environment
Build automation and tooling using Python or Go, with infrastructure-as-code experience (SaltStack, Ansible, or Terraform) and deployment safety instincts

Similar roles