Autodesk

Senior Site Reliability Engineer

Rol remoto de Site Reliability Engineering con fit claro de ubicación del candidato.

Publicado20 jun 2026

Países elegibles1 país aceptado

Señal de senioritySenior

Modelo de trabajoRemoto

Ubicaciones aceptadas para candidatos

Estados Unidos

AWS Azure CI/CD Java Kubernetes Python

Puedo aplicar realmente?Revisa la lista de países

Las ubicaciones aceptadas para candidatos están listadas (1).

Actualidad de la fuente20 jun 2026

Fit de ubicación1 país aceptado

Match de stackAWS, Azure

Camino de aplicaciónSitio de la empresa

Resumen de fit de MiraPor qué vale revisar este rol

Fit de ubicación1 país aceptadoAgrega tu país

Match de stackAgrega skills al perfil para compararAWS, Azure

Señal de senioritySeniorDefine tu nivel para una revisión más precisa.

Preparación para aplicarSitio de la empresaLa aplicación continúa en el sitio de la empresa.

Aplicación

Aplicar en el sitio de la empresa

Aplicación externa

Aplicando aSenior Site Reliability EngineerAutodesk

Fit de país1 país aceptado

Camino de aplicaciónSitio de la empresa

WithMiraGuarda o suscríbete antes de salir

Aplicación de la empresa

WithMira mantiene este rol para descubrimiento. La aplicación continúa en el sitio de la empresa.

Aplicar en el sitio de la empresa

Guardar rol

Resumen del rol

Senior Site Reliability Engineer

Requisitos y responsabilidades

Contenido del rol extraído en secciones para revisar más rápido.

Responsibilities

Serve as a primary owner for the reliability, availability, performance, operability, and capacity of one or more production services
Deploy, operate, maintain, and continuously improve production services running in Autodesk GovCloud environments
Partner with engineering teams to ensure services are designed with reliability, scalability, security, and operability in mind
Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews
Build automation to improve deployment safety, operational efficiency, incident response, and service recovery
Design, develop, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems
Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services
Lead and participate in incident response, troubleshooting, and post-incident reviews focused on learning and continuous improvement
Develop and maintain operational documentation, runbooks, and recovery procedures
Scale and enhance resilience testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness
Continuously identify and eliminate operational toil through software engineering, automation, and process improvement
Ensure supported services remain compliant with Autodesk security, privacy, and regulatory requirements, including FedRAMP and related controls where applicable
Participate in a 24x7 on-call rotation for production services
Function effectively in a fast-paced environment while helping establish and mature operational excellence practices for Autodesk GovCloud

Minimum Qualifications

B.S. or higher in Computer Science, Engineering, or a related technical discipline, or equivalent practical experience
7+ years of experience in Site Reliability Engineering, Software Engineering, Platform Engineering, Cloud Infrastructure, or Production Operations
Experience operating and supporting customer-facing production services in large-scale cloud environments
Strong understanding of reliability engineering principles, including SLOs/SLIs, observability, incident management, capacity planning, production readiness, and automation
Experience with AWS, Azure, or other public cloud platforms
Experience developing automation using languages such as Python, Go, Java, PowerShell, Bash, or similar
Experience with Infrastructure as Code, CI/CD pipelines, deployment automation, and modern cloud operations practices
Understanding of security, compliance, and operational risk management in production environments
Strong written and verbal communication skills

Preferred Qualifications

10+ years of experience operating highly available, customer-facing production systems
Experience with AWS GovCloud, FedRAMP, IL4/IL5, or other regulated cloud environments
Experience supporting services with stringent availability, reliability, and security requirements
Experience with containers, Kubernetes, cloud-native architectures, APIs, load balancing, networking, DNS, and distributed systems
Experience with observability platforms such as Splunk, Dynatrace, Datadog, CloudWatch, or similar technologies
Experience operating databases, storage platforms, messaging systems, caching technologies
Experience designing and implementing operational automation at scale
Experience leading or participating in Gamedays, disaster recovery exercises, resilience testing, or operational readiness reviews
Strong incident management experience, including technical leadership during major incidents and stakeholder communication
Strong collaboration skills and ability to work effectively across engineering, security, compliance, and operations teams
Passion for building reliable, secure, and scalable systems that customers can trust

Roles similares