HHAeXchange

Platform Engineer

Vaga remota de Platform Engineering com fit claro de localização do candidato.

Publicada2 de jul. de 2026

Países elegíveis1 país aceito

Sinal de senioridadeSenior

Modelo de trabalhoRemoto

Locais aceitos para candidatos

Estados Unidos

AWS CI/CD Docker Kubernetes

Posso mesmo aplicar?Confira a lista de países

Países aceitos para candidatos estão listados (1).

Atualidade da fonte2 de jul. de 2026

Fit de localização1 país aceito

Match de stackAWS, CI/CD

Caminho de aplicaçãoSite da empresa

Resumo de fit da MiraPor que vale revisar esta vaga

Fit de localização1 país aceitoAdicione seu país

Match de stackAdicione skills ao perfil para compararAWS, CI/CD

Sinal de senioridadeSeniorDefina seu nível para uma análise mais precisa.

Prontidão para aplicarSite da empresaA aplicação continua no site da empresa.

Aplicação

Aplicar no site da empresa

Aplicação externa

Aplicando paraPlatform EngineerHHAeXchange

Fit de país1 país aceito

Caminho de aplicaçãoSite da empresa

WithMiraSalve ou assine antes de sair

Aplicação da empresa

O WithMira mantém esta vaga para descoberta. A aplicação continua no site da empresa.

Aplicar no site da empresa

Salvar vaga

Resumo da vaga

Platform Engineer

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

Essential Job Duties

Own availability, latency, and performance targets for AI platform services and data infrastructure running on AWS
Design and implement monitoring, alerting, and observability frameworks across the platform stack
Lead incident response, root cause analysis, and post-mortem processes for platform-level outages or degradations
Define and track SLOs/SLAs for core platform primitives including RAG pipelines, agent orchestration services, and model access layers
Proactively identify reliability risks and drive engineering improvements before they become production issues
Build and maintain runbooks, disaster recovery procedures, and operational documentation
Design, build, and maintain CI/CD pipelines for AI platform components, data pipelines, and internal applications
Own infrastructure-as-code (IaC) practices across the team using tools such as Terraform or AWS CDK
Manage and optimize AWS environments including ECS, Lambda, S3, RDS, Redshift, API Gateway, and related services
Implement and enforce security, compliance, and cost optimization best practices across AWS infrastructure
Automate deployment, scaling, and configuration management to reduce manual operational overhead
Partner with AI Platform Engineers to containerize and operationalize AI services and agent frameworks
Support Data & AI Engineers with environment management, access controls, and deployment tooling for Polaris and data pipeline infrastructure
Serve as the operational backbone for the AI platform team – ensuring engineers can ship and iterate quickly without being blocked by infrastructure concerns
Contribute to our “factory model” vision by making deployment and reliability a repeatable, scalable capability rather than an ad hoc function

Details

Own availability, latency, and performance targets for AI platform services and data infrastructure running on AWS
Design and implement monitoring, alerting, and observability frameworks across the platform stack
Lead incident response, root cause analysis, and post-mortem processes for platform-level outages or degradations
Define and track SLOs/SLAs for core platform primitives including RAG pipelines, agent orchestration services, and model access layers
Proactively identify reliability risks and drive engineering improvements before they become production issues
Build and maintain runbooks, disaster recovery procedures, and operational documentation
Design, build, and maintain CI/CD pipelines for AI platform components, data pipelines, and internal applications
Own infrastructure-as-code (IaC) practices across the team using tools such as Terraform or AWS CDK
Manage and optimize AWS environments including ECS, Lambda, S3, RDS, Redshift, API Gateway, and related services
Implement and enforce security, compliance, and cost optimization best practices across AWS infrastructure
Automate deployment, scaling, and configuration management to reduce manual operational overhead
Partner with AI Platform Engineers to containerize and operationalize AI services and agent frameworks
Support Data & AI Engineers with environment management, access controls, and deployment tooling for Polaris and data pipeline infrastructure
Serve as the operational backbone for the AI platform team – ensuring engineers can ship and iterate quickly without being blocked by infrastructure concerns
Contribute to our “factory model” vision by making deployment and reliability a repeatable, scalable capability rather than an ad hoc function
Design, build, and maintain CI/CD pipelines for AI platform components, data pipelines, and internal applications
Own infrastructure-as-code (IaC) practices across the team using tools such as Terraform or AWS CDK
Manage and optimize AWS environments including ECS, Lambda, S3, RDS, Redshift, API Gateway, and related services
Implement and enforce security, compliance, and cost optimization best practices across AWS infrastructure
Automate deployment, scaling, and configuration management to reduce manual operational overhead
Partner with AI Platform Engineers to containerize and operationalize AI services and agent frameworks
Support Data & AI Engineers with environment management, access controls, and deployment tooling for Polaris and data pipeline infrastructure
Serve as the operational backbone for the AI platform team – ensuring engineers can ship and iterate quickly without being blocked by infrastructure concerns
Contribute to our “factory model” vision by making deployment and reliability a repeatable, scalable capability rather than an ad hoc function
Serve as the operational backbone for the AI platform team – ensuring engineers can ship and iterate quickly without being blocked by infrastructure concerns
Contribute to our “factory model” vision by making deployment and reliability a repeatable, scalable capability rather than an ad hoc function
3+ years of professional experience in a DevOps, SRE, or platform engineering role
Hands-on AWS experience required – AgentCore, Bedrock, ECS, Lambda, S3, RDS, Redshift, CloudWatch, IAM, VPC, and related services
Experience with infrastructure-as-code tools such as Terraform or AWS CDK
Strong CI/CD experience with tools such as GitHub Actions
Experience with containerization and orchestration (Docker, ECS, or Kubernetes)
Familiarity with AI/ML infrastructure patterns – model serving, vector databases, pipeline orchestration (strongly preferred)
Experience with observability and monitoring tooling (Datadog, CloudWatch)
Prior experience in a SaaS environment
Strong verbal and written communication skills with ability to collaborate across technical and non-technical stakeholders
Self-starter with a proactive approach to identifying and resolving infrastructure risk before it impacts delivery
Willingness to explore and adopt AI tools responsibly to enhance productivity and innovation in your role.

Other Job Duties

Other duties as assigned by supervisor or HHAeXchange leader.

Travel Requirements

Travel up to 10%, including overnight travel

Required Education, Experience, Certifications and Skills

3+ years of professional experience in a DevOps, SRE, or platform engineering role
Hands-on AWS experience required – AgentCore, Bedrock, ECS, Lambda, S3, RDS, Redshift, CloudWatch, IAM, VPC, and related services
Experience with infrastructure-as-code tools such as Terraform or AWS CDK
Strong CI/CD experience with tools such as GitHub Actions
Experience with containerization and orchestration (Docker, ECS, or Kubernetes)
Familiarity with AI/ML infrastructure patterns – model serving, vector databases, pipeline orchestration (strongly preferred)
Experience with observability and monitoring tooling (Datadog, CloudWatch)
Prior experience in a SaaS environment
Strong verbal and written communication skills with ability to collaborate across technical and non-technical stakeholders
Self-starter with a proactive approach to identifying and resolving infrastructure risk before it impacts delivery
Willingness to explore and adopt AI tools responsibly to enhance productivity and innovation in your role.

Vagas similares