Role overview

Senior Site Reliability Engineer

FocusDeveloperRole area
SenioritySeniorCandidate level
StackAWS, Java, KubernetesPrimary skills
Location1 accepted countryEligibility

Requirements and responsibilities

Readable role content extracted into sections for faster review.

About the role

We are looking for a Senior Site Reliability Engineering to strengthen our platform reliability and observability capabilities. You will own the design and operation of monitoring infrastructure — including Datadog APM, alerting, and distributed tracing — across Kubernetes-based microservices on AWS. The role spans backend engineering and SRE practice in roughly a 65/35 split, with direct involvement in CI/CD integration and observability automation. You will also support internal teams in adopting monitoring best practices as we modernize our R&D platform.

What you will do

  • Design, build, and maintain scalable backend and platform components;
  • Implement and manage observability solutions across distributed systems;
  • Configure dashboards, alerts, and APM for tracing, metrics, and logging;
  • Monitor and improve system reliability, scalability, and performance;
  • Deploy, operate, and maintain services in Kubernetes environments;
  • Integrate observability tools into CI/CD pipelines and cloud infrastructure;
  • Automate monitoring and operational workflows using scripting;
  • Provide operational and training support for observability platforms, especially Datadog;
  • Collaborate with engineering teams to improve system visibility and reliability practices.

Must haves

  • 4+ years of experience with Python, Node.js, or Java;
  • Hands-on experience with API integrations;
  • Strong experience in Kubernetes environments;
  • Experience with Datadog or similar tools such as Prometheus and Grafana;
  • Ability to configure dashboards, alerts, and APM;
  • Experience monitoring containerized and microservices architectures;
  • Hands-on experience with AWS;
  • Experience integrating observability tools into cloud environments;
  • Experience with CI/CD integrations for observability;
  • Ability to automate monitoring and operational tasks using scripting;
  • Upper-intermediate English level.

Nice to haves

  • Experience owning and operating an internal engineering platform, especially observability platforms;
  • Demonstrated ownership of reliability, scalability, and performance;
  • Ability to proactively lead maintenance and platform improvements;
  • Experience installing and configuring Datadog agents and integrations;
  • Experience managing API keys and secure configurations;
  • Experience managing user roles and access controls;
  • Familiarity with Go (Golang);
  • Experience with additional observability tools such as New Relic, Dynatrace, Elastic Stack, or Splunk.

Tech stack

Use these tags to compare similar remote roles.

Location eligibility

Candidates should apply only when their profile country is listed here.

Your profileCountry not setSign in to check your country against this role.

Hiring flow

Applications are saved in WithMira for review and follow-up.

1Apply with your profile and resume snapshot.
2Recruiter reviews your fit for this position.
3Messages and referral status stay attached to this role.