Resumo da vaga

Principal Observability & Reliability Architect

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

Responsibilities

  • Lead client discovery, architecture workshops, and solution design across observability, telemetry, reliability, and operational intelligence initiatives.
  • Design enterprise observability architectures spanning monitoring, logging, metrics, tracing, telemetry pipelines, alerting, event correlation, service visibility, and platform integrations.
  • Define scalable standards for telemetry onboarding, naming, tagging, RBAC, service ownership, dashboards, alert governance, runbooks, and operational handoff.
  • Advise on telemetry governance, including data quality, retention, access control, sampling, cardinality, and cost optimization.
  • Lead modernization initiatives including tool rationalization, dashboard and alert rationalization, telemetry strategy, and migration from legacy monitoring platforms.
  • Guide SRE practices including SLIs, SLOs, error budgets, production readiness, and incident response maturity.
  • Design integration patterns across ITSM, CMDB, event management, and automation platforms.
  • Support pursuits by shaping solution strategy, validating scope, informing estimates, and building client-facing technical narratives.
  • Serve as a senior escalation point and provide architecture governance during delivery.
  • Build reusable reference architectures, playbooks, and accelerators while mentoring architects, consultants, and offshore teams.

Qualifications

  • 10+ years in observability, monitoring, APM, platform operations, SRE, or related enterprise technology domains, including 5+ years leading architecture and delivery strategy for enterprise observability or reliability initiatives.
  • Deep, hands-on experience designing and implementing across monitoring, logging, metrics, tracing, telemetry collection, and pipeline patterns in hybrid and multi-cloud environments.
  • Strong knowledge of telemetry governance, including routing, transformation, normalization, enrichment, retention, access control, and cost management.
  • Experience defining enterprise standards for dashboards, alerts, tagging, naming, service ownership, RBAC, and operating model adoption.
  • Strong command of incident response, event correlation, alert strategy, service health, and business-service visibility, plus applied SRE concepts including SLIs, SLOs, error budgets, and production readiness.
  • Ability to lead executive and technical workshops and translate business needs into actionable architecture and delivery plans.
  • Consulting or professional services experience with strong client-facing communication, estimation, risk management, and cross-functional leadership.

Preferred Qualifications

  • Platform experience such as Dynatrace, Splunk, Grafana, LogicMonitor, Datadog, New Relic, AppDynamics, Elastic, Prometheus, or OpenTelemetry.
  • Experience with telemetry pipeline tools such as OpenTelemetry Collector, Grafana Alloy, Fluent Bit, Kafka, Cribl, or Vector, along with familiarity with cloud, Kubernetes, CI/CD, and infrastructure as code.
  • Experience integrating with platforms such as ServiceNow, Jira Service Management, PagerDuty, Opsgenie, BigPanda, or xMatters.
  • Experience developing reusable consulting assets such as reference architectures, governance models, playbooks, POVs, and accelerators; relevant cloud, SRE, ITIL, or FinOps certifications are a plus.
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoObservabilityÁrea da vaga
Sinal de senioridadeSeniorNível do candidato
StackCI/CD, KubernetesSkills principais
Localização1 país aceitoElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link