Data Lake Engineer (US - Remote)

GuruLink • Contract • Remote (Orlando, Florida, USA) • 1m ago

Location: REMOTE / Orlando, Florida
This job allows you to work remotely.

This contract (US-based Corp to Corp)

The Data Lake Engineer: The position is to help augment the development staff to achieve the delivery of new solutions and enhancements. Responsibilities include understanding system requirements, coding, providing code fixes, and supporting across multiple releases.

Project Scope: PIMS2 Data Services

Special Qualifications

· Experience with Spark declarative pipelines (Databricks declarative pipeline patterns; formerly DLT/Delta Live Tables) including expectations-style rules and incremental processing.

· Experience implementing DQE patterns: completeness/accuracy checks, anomaly detection, and data observability/monitoring.

· Autoloader / incremental ingestion patterns and schema drift handling.

· Experience reading Kafka / Event Hubs in DLT pipelines and streaming exposure (Structured Streaming; Event Hubs/Kafka).

Work Description and Responsibilities:

This position will be responsible for:

· Design, build, and optimize PySpark pipelines in Azure Databricks for ingestion, transformation, and publishing.

· Implement Delta Lake Lakehouse patterns (Bronze/Silver/Gold) and drive performance/cost optimization.

· Implement Spark declarative pipelines (Databricks declarative pipeline patterns; formerly DLT) and enforce data quality gates (expectations/validation) to ensure trusted datasets.

· Configure and support governance practices in Databricks (Unity Catalog permissions/access control concepts, auditability).

· Deliver curated/semantic datasets for analytics and downstream consumers; ensure data contracts and consistency.

Develop, test and support initiatives and systems assigned.

Front End – Presentation Layer - Power BI (10%)

Middle Tier – Logic Layer (10%)

Back End – Data Layer (80%)

Contract Terms:

- Initial 6-month contract to start (Approx: 1040 standard hours)
- Working hours: 8 am - 4:30 pm (EST). The contractor must work the EST time zone.
- Corp to Corp engagement., the contract must have corp.

Important – Please Read Before Applying:

- U.S. Citizenship or Permanent Residency (Green Card) required. Candidates must have unrestricted authorization to work in the United States.

- U.S. residency required. Applicants must currently reside within the United States for the duration of this engagement.

- No subcontracting or candidate re-assignment permitted. This role is strictly for the individual applying.

Fraud prevention notice (Must Read Before Applying):

- ** Applications involving impersonation, proxy interviews, AI/deepfake representation, or misrepresentation of identity. will be disqualified immediately.

Interviews:

- ** Any form of AI Assistance, and any external sources to assist during interviews is STRICTLY PROHIBITED. This will be monitored during all interviews. Any candidate suspected of using any external sources will be immediately declined from the process.

Must Have Skills:

Skills Required:

· 7+ years’ experience in analysis, design, and coding

· 7+ years building scalable, production-grade data platforms and pipelines

· 5+ years Apache Spark / PySpark (batch processing, performance tuning, optimization)

· 5+ years ETL/ELT development, data modeling, and transformation patterns/frameworks

· 3+ years Azure Databricks (jobs/workflows, clusters, environment management)

· 2+ years data governance/catalog concepts in Databricks (Unity Catalog permissions/RBAC, auditing/lineage concepts)

· Strong Delta Lake / Lakehouse experience (Bronze/Silver/Gold, MERGE, schema evolution, OPTIMIZE/Z-ORDER basics)

· Strong SQL (complex queries, tuning for large datasets, reconciliation)

· Azure fundamentals for data engineering (ADLS Gen2, identity/service principals/managed identity, secrets/Key Vault)

· Hands-on experience building/operating Data Quality Engineering (DQE): validation rules, reconciliations, and automated quality gates in pipelines

· Bachelor’s degree in Computer Science or a related analytical field, or equivalent experience, is preferred