Role overview

Site Reliability Engineer (West Coast)

Requirements and responsibilities

Readable role content extracted into sections for faster review.

Details

  • Ensure the reliability of our critical products and services by meeting or exceeding SRE objectives.
  • Instantiate and maintain production infrastructure using Infrastructure as Code and Configuration Management tools.
  • Build and maintain proper monitoring of our services by utilizing centralized logging and time series databases.
  • Automate deployments, administration, and monitoring of our services by following CI/CD practices.
  • Work with engineering and information security teams to enhance, document, establish processes and generally improve the operability and security of our services.
  • Participation in team on-call rotation is required.
  • Additional tasks associated with this position may be assigned in response to company initiatives and business needs.

Education:

  • Bachelor's degree in information systems, computer science, technology, or a related field is strongly preferred. In lieu of degree, 2+ years of relevant and/or equivalent experience is acceptable.

Experience:

  • Minimum of 3+ years of software and/or operational experience in building and maintaining internet-facing production environments is required.
  • Strong experience with Linux/Unix systems administration.
  • Knowledge of source control tools (Git preferred).
  • Experience with Configuration Management and Infrastructure as Code tools (Ansible, Puppet, Terraform preferred).
  • Good understanding of container technology (Docker, Kubernetes preferred).
  • Experience with monitoring tools (Prometheus, Grafana, Nagios, or similar.) and alerting systems.
  • Experience with non-cloud infrastructure.
  • Experience running a large-scale 24/7 production environment.
  • Experience with distributed data processing, databases, and large-scale file systems is a plus.

Experience:

  • Strong scripting abilities in Bash and Python.
  • Experience with incident management, troubleshooting, and root cause analysis.
  • Experience in handling postmortems, building incident response plans, and improving incident resolution procedures.
  • Experience running and maintaining real-world build systems (Jenkins, DroneCI, or similar tools)
  • Demonstrable experience with the entire life cycle of software, starting with Systems Architecture, Systems Design, Implementation, Maintenance, and Operation.
  • Programming experience using HTTP Service APIs.
  • Virtualization experience (VMWare, Proxmox, Oracle Linux Virtualization Manager).
  • Network administration experience is a plus.
  • Exposure to Security and Testing frameworks is a plus.
  • Exposure to compliant regulated industries such as Finance, Healthcare, or Government is a plus.
  • Experience with distributed data processing, databases, and large-scale file systems is a plus.
Similar roles

Keep a backup shortlist.

Browse stack
FocusSite Reliability EngineeringRole area
Seniority signalMiddleCandidate level
StackCI/CD, Docker, KubernetesPrimary skills
Location1 accepted countryEligibility

Stack

Use these tags to compare similar remote roles.

Location eligibility

Candidates should apply only when their profile country is listed here.

Your profileCountry not setSign in to check your country against this role.

Hiring flow

WithMira shows the role, then sends candidates to the company application.

1Check role fit, stack, and location eligibility in WithMira.
2Open the company application page from the tracked apply link.
3Save the role or subscribe for similar opportunities before leaving.
Apply on company siteCompany siteOpen link