Job description
· Design and Develop Scalable Data Pipelines: Build and maintain robust data pipelines using Python to process, transform, and integrate large-scale data from diverse sources.
· Orchestration and Automation: Implement and manage workflows using orchestration tools such as Apache Airflow to ensure reliable and efficient data operations.
· Data Warehouse Management: Work extensively with Snowflake to design and optimize data models, schemas, and queries for analytics and reporting.
· Queueing Systems: Leverage message queues like Kafka, SQS, or similar tools to enable real-time or batch data processing in distributed environments.
· Collaboration: Partner with Data Science, Product, and Engineering teams to understand data requirements and deliver solutions that align with business objectives.
· Performance Optimization: Optimize the performance of data pipelines and queries to handle large scales of data efficiently.
· Data Governance and Security: Ensure compliance with data governance and security standards to maintain data integrity and privacy.
· Documentation: Create and maintain clear, detailed documentation for data solutions, pipelines, and workflows.
Qualification
· 9+ years of experience in data engineering roles with a focus on building scalable data solutions.
· Proficiency in Python for ETL, data manipulation, and scripting.
· Hands-on experience with Snowflake or equivalent cloud-based data warehouses.
· Strong knowledge of orchestration tools such as Apache Airflow or similar.
· Expertise in implementing and managing messaging queues like Kafka, AWS SQS, or similar.
· Demonstrated ability to build and optimize data pipelines at scale, processing terabytes of data.
· Experience in data modeling, data warehousing, and database design.
· Proficiency in working with cloud platforms like AWS, Azure, or GCP.
· Strong understanding of CI/CD pipelines for data engineering workflows.
· Experience working in an Agile development environment, collaborating with cross-functional teams.
· Familiarity with other programming languages like Scala or Java for data engineering tasks.
· Knowledge of containerization and orchestration technologies (Docker, Kubernetes).
· Experience with stream processing frameworks like Apache Flink.
· Experience with Apache Iceberg for data lake optimization and management.
· Exposure to machine learning workflows and integration with data pipelines.
Soft Skills:
· Strong problem-solving skills with a passion for solving complex data challenges.
· Excellent communication and collaboration skills to work with cross-functional teams.
· Ability to thrive in a fast-paced, innovative environment.