Senior Data Engineer

Job Title: Senior Data Engineer

Location (onsite): Vienna, VA

Pay Rate: $43-$53 per hour (W2)

Duration: 06 Months

Key Responsibilities:

Build and Maintain Data Pipelines: Develop scalable data pipelines using PySpark and Spark within the Databricks environment.
Implement Medallion Architecture: Design workflows using raw, trusted, and refined layers to drive reliable data processing.
Integrate Diverse Data Sources: Connect data from Kafka streams, extract channels, and APIs.
Data Cataloging and Governance: Model and register datasets in enterprise data catalogs, ensuring robust governance and accessibility.
Access Control: Manage secure, role-based access patterns to support analytics, AI, and ML needs.
Team Collaboration: Work closely with peers to achieve required code coverage and deliver high-quality, well-tested solutions.
Optimize and Operationalize: Tune Spark jobs (partitioning, caching, broadcast joins, AQE), manage Delta Lake performance (Z-Ordering, OPTIMIZE, VACUUM), and implement cost and reliability best practices on AWS.
Data Quality and Testing: Implement data quality checks and validations (e.g., Great Expectations, custom PySpark checks), unit/integration tests, and CI/CD for Databricks Jobs/Workflows.
Infrastructure as Code: Provision and manage Databricks and AWS resources using Terraform (workspaces, clusters, jobs, secret scopes, Unity Catalog objects, S3, IAM).
Monitoring and Observability: Set up logging, metrics, and alerts (CloudWatch, Datadog, Databricks audit logs) for pipelines and jobs.
Documentation: Produce clear technical documentation, runbooks, and data lineage for governed datasets.

Required Skills & Qualifications:

Databricks: 6-9 years of experience with expert-level proficiency
PySpark/Spark: 6-9 years of advanced hands-on experience
AWS: 6-9 years of experience with strong competency, including S3 and Terraform for infrastructure-as-code
Data Architecture: Solid knowledge of the medallion pattern and data warehousing best practices
Data Pipelines: Proven ability to build, optimize, and govern enterprise data pipelines
Delta Lake and Unity Catalog: Expertise in Delta Lake internals, time travel, schema evolution/enforcement, and Unity Catalog RBAC/ABAC
Streaming: Hands-on experience with Spark Structured Streaming, Kafka, checkpointing, exactly-once semantics, and late-arriving data handling
CI/CD: Experience with Git-based workflows and CI/CD for Databricks (e.g., Databricks Repos, dbx, GitHub Actions, Azure DevOps, or Jenkins)
Security and Compliance: Experience with IAM, KMS, encryption, secrets management, token/credential rotation, and PII governance
Performance and Cost: Demonstrated ability to tune Spark jobs and optimize Databricks cluster configurations and AWS usage for cost and throughput
Collaboration: Experience working in Agile/Scrum teams, peer reviews, and achieving code coverage targets

Preferred Skills & Qualifications:

Certifications: Databricks Data Engineer Professional, AWS Solutions Architect/Developer, HashiCorp Terraform Associate
Data Catalogs: Experience with enterprise catalogs such as Collibra or Alation, and lineage tooling such as OpenLineage
Orchestration: Databricks Workflows and/or Airflow
Additional AWS: Glue, Lambda, Step Functions, CloudWatch, Secrets Manager
Testing: pytest, chispa, Great Expectations, dbx test
Domain Experience: Analytics and ML feature pipelines, MLOps integrations

Apply for job