Resumo da vaga

Senior Data Engineer- Databricks

Requisitos e responsabilidades

Conteúdo da vaga extraído em seções para revisão mais rápida.

Impact You Will Make in the Role:

  • Own Databricks production support for the Sugar Predict data platform, including monitoring, alerting, and incident response across all production data flows
  • Maintain and report on SLA performance metrics for data pipeline delivery, ensuring visibility into platform health and accountability across internal and external stakeholders
  • Identify and implement pipeline optimizations that reduce Databricks compute costs, improve throughput, andreduce processing windows while tracking impacts through measurable KPIs
  • Migrate legacy ETL/ELT pipelines to Databricks, building automation tooling to reduce manual intervention and ensure uninterrupted data delivery during transitions
  • Support new customers onboarding by provisioning, validating, and hardening tenant data pipelines that deliver reliable, isolated data from day one
  • Design and build high-performance Databricks pipelines that ingest, transform, and serve ERP and CRM data at scale across both Azure and AWS environments
  • Own the Delta Lake architecture including schema design, partitioning strategies, data quality enforcement, and incremental processing patterns
  • Enforce data security best practices across Databricks environments, including role-based access control, secrets management, and compliance requirements for enterprise CRM and ERP data
  • Implement data quality monitoring and observability across pipeline health and ML model inputs, ensuring data integrity that directly supports Sugar Predict prediction accuracy
  • Apply and enforce multi-tenant data isolation patterns ensuring reliable, secure data delivery across Sugar Predict enterprise customers
  • Partner with the Enterprise Architecture team to ensure Sugar Predict data pipelines integrate seamlessly with the broader SugarAI product ecosystem
  • Support a globally distributed operation through on-call rotation and after-hours incident response, meeting SLAs across multiple time zones
  • Maintain technical documentation, runbooks, and architectural decision records, contributing to team knowledge sharing and operational readiness across on-call and incident response scenarios
  • Apply CI/CD best practices to data pipeline development, including version control, automated testing, and deployment tooling to ensure reliable and repeatable pipeline delivery

Details

  • Own Databricks production support for the Sugar Predict data platform, including monitoring, alerting, and incident response across all production data flows
  • Maintain and report on SLA performance metrics for data pipeline delivery, ensuring visibility into platform health and accountability across internal and external stakeholders
  • Identify and implement pipeline optimizations that reduce Databricks compute costs, improve throughput, andreduce processing windows while tracking impacts through measurable KPIs
  • Migrate legacy ETL/ELT pipelines to Databricks, building automation tooling to reduce manual intervention and ensure uninterrupted data delivery during transitions
  • Support new customers onboarding by provisioning, validating, and hardening tenant data pipelines that deliver reliable, isolated data from day one
  • Design and build high-performance Databricks pipelines that ingest, transform, and serve ERP and CRM data at scale across both Azure and AWS environments
  • Own the Delta Lake architecture including schema design, partitioning strategies, data quality enforcement, and incremental processing patterns
  • Enforce data security best practices across Databricks environments, including role-based access control, secrets management, and compliance requirements for enterprise CRM and ERP data
  • Implement data quality monitoring and observability across pipeline health and ML model inputs, ensuring data integrity that directly supports Sugar Predict prediction accuracy
  • Apply and enforce multi-tenant data isolation patterns ensuring reliable, secure data delivery across Sugar Predict enterprise customers
  • Partner with the Enterprise Architecture team to ensure Sugar Predict data pipelines integrate seamlessly with the broader SugarAI product ecosystem
  • Support a globally distributed operation through on-call rotation and after-hours incident response, meeting SLAs across multiple time zones
  • Maintain technical documentation, runbooks, and architectural decision records, contributing to team knowledge sharing and operational readiness across on-call and incident response scenarios
  • Apply CI/CD best practices to data pipeline development, including version control, automated testing, and deployment tooling to ensure reliable and repeatable pipeline delivery
  • 4+ years of data engineering experience
  • At least 2 years on Databricks or the Apache Spark ecosystem across Azure and/or AWS
  • Proficiency in PySpark, SQL, and Python with a strong track record building and operating production-grade pipelines under SLA constraints
  • Hands-on experience with Delta Lake including schema evolution, ACID transactions, optimize/vacuum lifecycle, and both incremental and streaming processing patterns
  • Hands-on experience with pipeline performance tuning and compute optimization in production Databricks environments
  • Solid working knowledge of PostgreSQL including query optimization, schema design, and use as a source or sink in production data pipelines
  • Experience supporting and maintaining legacy ETL tooling (SSIS, Informatica, custom Python/SQL pipelines, or similar) in production
  • Experience supporting large-scale multi-tenant architectures with a focus on tenant isolation, per-tenant performance, and data privacy, including navigating tools and platforms that default to single-tenant assumptions
  • Proven ability to work collaboratively across data science, product, and infrastructure teams, owning end-to-end delivery in a cross-functional environment
  • Strong understanding of data governance, security, and compliance principles, including access control, data privacy, and protection of sensitive enterprise data across multi-tenant environments
  • Experience operating Databricks workspaces across both Azure and AWS, including cost governance, cluster management, and cross-cloud data access
  • Experience optimizing Databricks workloads in a Serverless environment, including compute cost governance and performance tuning for serverless compute
  • Experience with Microsoft SQL Server in a data engineering or ETL context
  • Exposure to ML feature engineering or feature stores (Databricks Feature Store, Feast, or similar) supporting predictive analytics
  • Experience with customer onboarding automation or IaC patterns for provisioning tenant data pipelines at scale
  • Databricks Certified Data Engineer Associate or Professional certification

What You Will Bring:

  • 4+ years of data engineering experience
  • At least 2 years on Databricks or the Apache Spark ecosystem across Azure and/or AWS
  • Proficiency in PySpark, SQL, and Python with a strong track record building and operating production-grade pipelines under SLA constraints
  • Hands-on experience with Delta Lake including schema evolution, ACID transactions, optimize/vacuum lifecycle, and both incremental and streaming processing patterns
  • Hands-on experience with pipeline performance tuning and compute optimization in production Databricks environments
  • Solid working knowledge of PostgreSQL including query optimization, schema design, and use as a source or sink in production data pipelines
  • Experience supporting and maintaining legacy ETL tooling (SSIS, Informatica, custom Python/SQL pipelines, or similar) in production
  • Experience supporting large-scale multi-tenant architectures with a focus on tenant isolation, per-tenant performance, and data privacy, including navigating tools and platforms that default to single-tenant assumptions
  • Proven ability to work collaboratively across data science, product, and infrastructure teams, owning end-to-end delivery in a cross-functional environment
  • Strong understanding of data governance, security, and compliance principles, including access control, data privacy, and protection of sensitive enterprise data across multi-tenant environments

Preferred Qualifications/Experience:

  • Experience operating Databricks workspaces across both Azure and AWS, including cost governance, cluster management, and cross-cloud data access
  • Experience optimizing Databricks workloads in a Serverless environment, including compute cost governance and performance tuning for serverless compute
  • Experience with Microsoft SQL Server in a data engineering or ETL context
  • Exposure to ML feature engineering or feature stores (Databricks Feature Store, Feast, or similar) supporting predictive analytics
  • Experience with customer onboarding automation or IaC patterns for provisioning tenant data pipelines at scale
  • Databricks Certified Data Engineer Associate or Professional certification
Vagas similares

Mantenha uma lista reserva.

Ver stack
FocoData EngineeringÁrea da vaga
Sinal de senioridadeSeniorNível do candidato
StackAWS, Azure, CI/CDSkills principais
Localização1 país aceitoElegibilidade

Stack

Use estas tags para comparar vagas remotas similares.

Elegibilidade de localização

Candidatos devem aplicar apenas quando o país do perfil estiver listado aqui.

Seu perfilPaís não definidoEntre para comparar seu país com esta vaga.

Fluxo de contratação

O WithMira mostra a vaga e depois envia candidatos para a aplicação da empresa.

1Confira fit da vaga, stack e elegibilidade de localização no WithMira.
2Abra a página de aplicação da empresa pelo link rastreado.
3Salve a vaga ou assine oportunidades similares antes de sair.
Aplicar no site da empresaSite da empresaAbrir link