Virtasant

Future Openings- SRE Support Engineer- Observability

Rol remoto de SRE con fit claro de ubicación del candidato.

Publicado12 jun 2026

Países elegibles1 país aceptado

Señal de seniorityMiddle

Modelo de trabajoRemoto

Ubicaciones aceptadas para candidatos

Estados Unidos

Kubernetes

Puedo aplicar realmente?Revisa la lista de países

Las ubicaciones aceptadas para candidatos están listadas (1).

Actualidad de la fuente12 jun 2026

Fit de ubicación1 país aceptado

Match de stackKubernetes

Camino de aplicaciónSitio de la empresa

Resumen de fit de MiraPor qué vale revisar este rol

Fit de ubicación1 país aceptadoAgrega tu país

Match de stackAgrega skills al perfil para compararKubernetes

Señal de seniorityMiddleDefine tu nivel para una revisión más precisa.

Preparación para aplicarSitio de la empresaLa aplicación continúa en el sitio de la empresa.

Aplicación

Aplicar en el sitio de la empresa

Aplicación externa

Aplicando aFuture Openings- SRE Support Engineer- ObservabilityVirtasant

Fit de país1 país aceptado

Camino de aplicaciónSitio de la empresa

WithMiraGuarda o suscríbete antes de salir

Aplicación de la empresa

WithMira mantiene este rol para descubrimiento. La aplicación continúa en el sitio de la empresa.

Aplicar en el sitio de la empresa

Guardar rol

Resumen del rol

Future Openings- SRE Support Engineer- Observability

Requisitos y responsabilidades

Contenido del rol extraído en secciones para revisar más rápido.

Success Measures

Healthy volume of threads and tickets handled with high-quality outcomes
Consistent achievement of time-based SLAs
High customer satisfaction through surveys
Accurate classification of issue type, severity, and recurring patterns
Reduced repeat issues through better docs, tooling, and scalable onboarding

What Will Be True When You Succeed

Customers can onboard smoothly to monitoring/alerting with minimal friction
Monitoring and alerting issues are resolved quickly, with fewer escalations
Linux and networking-related incidents reach resolution faster due to strong troubleshooting and clean handoffs
Engineering and SRE teams receive clear, actionable feedback based on real customer trends
Knowledge base content prevents tickets and accelerates self-service

1) Frontline Support for Observability & Tooling

Manage Slack threads and tickets (roughly 50/50)
Handle a broad range of customer support: simple issue resolution through end-to-end onboarding
Provide clear, structured guidance to highly technical customers
Maintain strong attention to detail while managing multiple interactions in parallel

2) Deep-Dive Troubleshooting & Incident Support

Troubleshoot, isolate, and resolve monitoring and alerting issues (especially Prometheus + AlertManager)
Troubleshoot complex Linux and networking issues (TCP/IP fundamentals required)
Support OpenTelemetry, tracing, and telemetry pipelines, including investigation of gaps in signals and instrumentation
Drive incidents to resolution in partnership with Engineering/SRE teams

3) Documentation & Knowledge Development

Build and maintain customer-facing and internal knowledge base articles
Create informational posts for the community support platform
Turn repeated issues into reusable guides, checklists, and onboarding playbooks

4) Trend Analysis & Feedback to Engineering

Analyze and categorize customer interaction trends
Provide accurate, meaningful feedback to Engineering and SRE orgs to improve product/tooling
Identify “top offenders” and propose practical fixes (tooling, docs, process, product)

5) Operational Excellence & Continuous Improvement

Participate in post-mortem reviews and drive follow-through on improvements
Contribute meaningfully to team objectives and goals (process, tooling, and service scaling)
Bring creativity and discretion to resolve highly complex issues “outside the box”

Frontline Support

Moves smoothly from triage to deeper analysis without losing the customer
Communicates clearly and confidently with technical users
Maintains clean follow-ups and thread hygiene even with high context switching

Troubleshooting

Rapidly isolates issues across monitoring/alerting configs, Linux runtime behavior, and network connectivity
Uses structured approaches to incident handling: hypothesis → test → evidence → resolution
Produces high-signal writeups that accelerate downstream resolution

Documentation & Enablement

Documentation is clear enough that customers avoid opening tickets
Onboarding flows reduce time-to-value and prevent common misconfigurations
Captures “tribal knowledge” quickly and makes it reusable

Operational Excellence

Obsessing over details: correct severity, accurate tagging, clean timelines, strong handoffs
Spots patterns early and proactively proposes improvements that scale support

Typical Day / Work Patterns

~50% Slack support, ~50% ticket handling
Deep-dive investigations during lower ticket volume periods
Documentation writing and lightweight tooling/process improvements when patterns emerge
Weekly team review of escalations, themes, and operational improvements
High rate of context switching and parallel issue management

Required Skills & Experience (Non-Negotiable)

Several years supporting highly scalable applications and web services
Hands-on experience with open-source observability and cloud-native tooling, including:Kubernetes (and container fundamentals)Prometheus and AlertManager troubleshootingOpenTelemetry and distributed tracing concepts
Kubernetes (and container fundamentals)
Prometheus and AlertManager troubleshooting
OpenTelemetry and distributed tracing concepts
Strong understanding of the Linux operating system (command line, process/network debugging, logs)
Good understanding of infrastructure observability principles (signals, alerting strategy, SLO thinking, noise reduction)
Good understanding of the TCP/IP suite and practical networking troubleshooting
Strong experience troubleshooting ambiguous, multi-layer issues
Excellent analytical capability and strong attention to detail
Strong written and verbal communication (clear, structured, customer-friendly)
Comfortable working with a very technical customer base
Passion for Technical Support and a service mindset

Details

Kubernetes (and container fundamentals)
Prometheus and AlertManager troubleshooting
OpenTelemetry and distributed tracing concepts

Nice-to-Haves

Experience improving or supporting internal support tooling or workflows (automation, templates, runbooks)
Experience operating at scale in a services environment (pattern detection, KPI/SLA awareness, operational process maturity)
Familiarity with Grafana, log aggregation, incident tooling, and production support practices
Prior SRE or platform support experience

Minimum Qualifications

3–7+ years in Technical Support Engineering, SRE support, DevOps, Platform Support, or similar
Demonstrated experience supporting distributed systems, IaaS, or cloud platforms
Strong Linux, troubleshooting, and customer-facing communication background
Evidence of documentation, knowledge-base contributions, and process improvement mindset

What You’ll Love

Real technical problem solving with tangible customer impact
A role that blends deep troubleshooting with scaling support via docs, tooling, and process
High autonomy in a remote-first environment

What May Be Challenging

High context switching and managing multiple threads in parallel
Repeated patterns that require discipline to convert pain into scalable improvements
Supporting high-visibility systems where speed and accuracy matter

Roles similares