Resumo da vaga

Site Observability Engineer

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

Job Title: Site Observability Engineer

  • Design and operate enterprise-grade observability platforms covering metrics, logs, traces, events, and synthetic monitoring.
  • Architect Prometheus / Thanos / Mimir, Grafana, Loki, Tempo, OpenTelemetry, and Datadog deployments for high availability and scale.
  • Develop standards for service instrumentation, including OpenTelemetry adoption, metric naming, label cardinality, and structured logging conventions.
  • Define and enforce SLOs, SLIs, and error budgets, and build the dashboards and alerts that operationalize them.
  • Build alerting strategies that minimize noise, surface actionable signals, and integrate cleanly with on-call workflows in PagerDuty, Opsgenie, or similar tools.
  • Operate large-scale time-series and log storage platforms, balancing retention, query performance, and cost.
  • Design distributed tracing pipelines and help teams use traces to diagnose latency and reliability issues.
  • Develop self-service tooling, paved-road libraries, and templates that make adoption of observability standards easy for product teams.
  • Drive cost management and label-cardinality discipline across the observability estate.
  • Lead incident response readiness improvements through better dashboards, alerting hygiene, and post-incident analysis tooling.
  • Partner with SRE and platform teams to integrate observability into deployment pipelines, canary analysis, and progressive delivery workflows.
  • Evaluate and recommend observability vendors and open-source tools based on cost, capability, and operational maturity.
  • Mentor engineering teams on observability fundamentals, debugging techniques, and SLO-driven operations.
  • Maintain documentation, onboarding guides, and runbooks for the observability platform.

Job Title: Site Observability Engineer

  • Bachelor’s degree in Computer Science or a related field.
  • Five or more years of experience in SRE, platform engineering, or observability roles.
  • Deep hands-on experience with Prometheus, Grafana, and at least one major commercial observability platform such as Datadog, New Relic, or Splunk.
  • Strong understanding of OpenTelemetry, distributed tracing, and structured logging.
  • Proficiency in at least one general-purpose language such as Go, Python, or Java.
  • Experience operating high-cardinality, high-throughput metrics and log pipelines.
  • Strong understanding of SLOs, error budgets, and SRE principles.
  • Experience integrating observability with CI/CD and incident management tooling.
  • Solid grasp of Linux internals, networking, and container platforms.
  • Excellent communication and collaboration skills.

Job Title: Site Observability Engineer

  • Experience with Thanos, Mimir, Cortex, Loki, or Tempo at scale.
  • Contributions to OpenTelemetry or observability open-source projects.
  • Familiarity with eBPF-based observability tooling.
  • Experience driving observability cost optimization initiatives.
  • Exposure to regulated environments with audit-grade logging requirements.
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoSite Observability EngineerÁrea da vaga
Sinal de senioridadeSeniorNível do candidato
StackCI/CD, Java, PythonSkills principais
Localização1 país aceitoElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link