Role Summary:
We are seeking a skilled Senior Data Engineer with expertise in building and maintaining data pipelines, designing scalable cloud-based solutions, and optimizing data workflows. The ideal candidate will have a strong foundation in Python, SQL, and Google Cloud Platform (or equivalent cloud platforms) along with hands-on experience in API development and integration. You will play a critical role in enabling data-driven decision-making by building robust, high-performance data systems and working closely with cross-functional teams to ensure seamless data accessibility.
Key Responsibilities
Data Pipeline Development:
- Design, build, and maintain scalable data pipelines for processing and transforming structured and unstructured data from multiple sources.
API Development and Integration:
- Create and manage RESTful APIs to facilitate real-time and batch data integration across systems, ensuring secure and reliable data access for internal and external stakeholders.
Cloud Data Infrastructure:
- Develop and deploy data solutions on Google Cloud Platform (or AWS/Azure), including BigQuery, Cloud Storage, and other relevant cloud-based tools.
Data Modeling & Optimization:
- Design efficient data models and optimize database performance for both transactional and analytical workloads.
Collaboration:
- Work with data analysts, data scientists, and software engineers to ensure seamless integration between data systems and business applications.
Monitoring & Quality Assurance:
- Monitor data pipelines for accuracy, latency, and performance; implement quality assurance processes to ensure the reliability of data outputs.
Documentation:
- Maintain detailed documentation for data pipelines, APIs, and data workflows to ensure scalability and maintainability.
Qualifications
- Technical Expertise:
- 5+ years of experience in Python and SQL for data engineering tasks.
- Proven experience developing and managing APIs (RESTful APIs preferred) to enable secure data sharing and integration.
- Hands-on experience with Google Cloud Platform (BigQuery, Cloud Storage) or equivalent cloud platforms (AWS, Azure).
- Knowledge of ETL/ELT frameworks and data pipeline orchestration tools (e.g., Airflow, dbt).
- Data Architecture & Modeling:
- Strong understanding of data modeling for relational and NoSQL databases.
- Experience with data warehouse solutions and analytics tools.
- Problem-Solving & Collaboration:
- Excellent analytical and problem-solving skills with a collaborative mindset.
- Proven ability to work cross-functionally with technical and business teams.
- Additional Skills (Preferred):
- Familiarity with DevOps practices and tools for CI/CD in data engineering workflows.
- Knowledge of data governance and security best practices.
- Experience with machine learning pipelines or streaming data systems is a plus.