Senior Data Engineer

Raas Infotek LLC Logo
Raas Infotek LLC
  • Logistics
  • Applications have closed

Job Title: Senior Data Engineer

Location (onsite): Vienna, VA

Pay Rate: $43-$53 per hour (W2)

Duration: 06 Months

Key Responsibilities:

  • Build and Maintain Data Pipelines: Develop scalable data pipelines using PySpark and Spark within the Databricks environment.
  • Implement Medallion Architecture: Design workflows using raw, trusted, and refined layers to drive reliable data processing.
  • Integrate Diverse Data Sources: Connect data from Kafka streams, extract channels, and APIs.
  • Data Cataloging and Governance: Model and register datasets in enterprise data catalogs, ensuring robust governance and accessibility.
  • Access Control: Manage secure, role-based access patterns to support analytics, AI, and ML needs.
  • Team Collaboration: Work closely with peers to achieve required code coverage and deliver high-quality, well-tested solutions.
  • Optimize and Operationalize: Tune Spark jobs (partitioning, caching, broadcast joins, AQE), manage Delta Lake performance (Z-Ordering, OPTIMIZE, VACUUM), and implement cost and reliability best practices on AWS.
  • Data Quality and Testing: Implement data quality checks and validations (e.g., Great Expectations, custom PySpark checks), unit/integration tests, and CI/CD for Databricks Jobs/Workflows.
  • Infrastructure as Code: Provision and manage Databricks and AWS resources using Terraform (workspaces, clusters, jobs, secret scopes, Unity Catalog objects, S3, IAM).
  • Monitoring and Observability: Set up logging, metrics, and alerts (CloudWatch, Datadog, Databricks audit logs) for pipelines and jobs.
  • Documentation: Produce clear technical documentation, runbooks, and data lineage for governed datasets.

Required Skills & Qualifications:

  • Databricks: 6-9 years of experience with expert-level proficiency
  • PySpark/Spark: 6-9 years of advanced hands-on experience
  • AWS: 6-9 years of experience with strong competency, including S3 and Terraform for infrastructure-as-code
  • Data Architecture: Solid knowledge of the medallion pattern and data warehousing best practices
  • Data Pipelines: Proven ability to build, optimize, and govern enterprise data pipelines
  • Delta Lake and Unity Catalog: Expertise in Delta Lake internals, time travel, schema evolution/enforcement, and Unity Catalog RBAC/ABAC
  • Streaming: Hands-on experience with Spark Structured Streaming, Kafka, checkpointing, exactly-once semantics, and late-arriving data handling
  • CI/CD: Experience with Git-based workflows and CI/CD for Databricks (e.g., Databricks Repos, dbx, GitHub Actions, Azure DevOps, or Jenkins)
  • Security and Compliance: Experience with IAM, KMS, encryption, secrets management, token/credential rotation, and PII governance
  • Performance and Cost: Demonstrated ability to tune Spark jobs and optimize Databricks cluster configurations and AWS usage for cost and throughput
  • Collaboration: Experience working in Agile/Scrum teams, peer reviews, and achieving code coverage targets

Preferred Skills & Qualifications:

  • Certifications: Databricks Data Engineer Professional, AWS Solutions Architect/Developer, HashiCorp Terraform Associate
  • Data Catalogs: Experience with enterprise catalogs such as Collibra or Alation, and lineage tooling such as OpenLineage
  • Orchestration: Databricks Workflows and/or Airflow
  • Additional AWS: Glue, Lambda, Step Functions, CloudWatch, Secrets Manager
  • Testing: pytest, chispa, Great Expectations, dbx test
  • Domain Experience: Analytics and ML feature pipelines, MLOps integrations