Reddit
Staff Machine Learning Systems Engineer
Remote Machine Learning role with clear candidate location fit.
PostedRecently added
Eligible countries1 accepted country
Seniority signalLead
Work settingRemote
Accepted candidate locations
USA
Role overview
Staff Machine Learning Systems Engineer
Requirements and responsibilities
Readable role content extracted into sections for faster review.
Details
- Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more
- Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration
- Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment
- Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more
- Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges
- 8+ years of experience in ML infrastructure, including model training and model deployments
- Hands-on experience with ML optimization, including memory and GPU profiling
- Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more
- Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb)
- Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc.
- Deep experience working with distributed training frameworks, including Ray and Kubernetes
- Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.
- Strong organizational & communication skills
- Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus
- Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus
Similar roles
Keep a backup shortlist.
Stack
Use these tags to compare similar remote roles.
Location eligibility
Candidates should apply only when their profile country is listed here.
Your profileCountry not setSign in to check your country against this role.
Hiring flow
WithMira shows the role, then sends candidates to the company application.
1Check role fit, stack, and location eligibility in WithMira.
2Open the company application page from the tracked apply link.
3Save the role or subscribe for similar opportunities before leaving.