Job Title: Cloud Engineer – Spark/Databricks Specialist
Location: Remote
Job Type: Contract
Industry: IT/Cloud Engineering
Job Summary:
We are looking for a highly skilled Cloud Engineer with a specialization in Apache Spark and Databricks to join our dynamic team. The ideal candidate will have extensive experience working with cloud platforms such as AWS, Azure, and GCP, and a deep understanding of data engineering, ETL processes, and cloud-native tools. Your primary responsibility will be to design, develop, and maintain scalable data pipelines using Spark and Databricks, while optimizing performance and ensuring data integrity across diverse environments.
Key Responsibilities:
Design and Development:
- Architect, develop, and maintain scalable ETL pipelines using Databricks, Apache Spark (Scala, Python), and other cloud-native tools such as AWS Glue, Azure Data Factory, and GCP Dataflow.
- Design and build data lakes and data warehouses on cloud platforms (AWS, Azure, GCP).
- Implement efficient data ingestion, transformation, and processing workflows with Spark and Databricks.
- Optimize the performance of ETL processes for faster data processing and lower costs.
- Develop and manage data pipelines using other ETL tools such as Informatica, SAP Data Intelligence, and others as needed.
Data Integration and Management:
- Integrate structured and unstructured data sources (relational databases, APIs, ERP systems) into the cloud data infrastructure.
- Ensure data quality, validation, and integrity through rigorous testing.
- Perform data extraction and integration from SAP or ERP systems, ensuring seamless data flow.
Performance Optimization:
- Monitor, troubleshoot, and enhance the performance of Spark/Databricks pipelines.
- Implement best practices for data governance, security, and compliance across data workflows.
Collaboration and Communication:
- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to define data requirements and deliver scalable solutions.
- Provide technical guidance and recommendations on cloud data engineering processes and tools.
Documentation and Maintenance:
- Document data engineering solutions, ETL pipelines, and workflows.
- Maintain and support existing data pipelines, ensuring they operate effectively and align with business goals.
Qualifications:
Education:
- Bachelor's degree in Computer Science, Information Technology, or a related field. Advanced degrees are a plus.
Experience:
- 7+ years of experience in cloud data engineering or similar roles.
- Expertise in Apache Spark and Databricks for data processing.
- Proven experience with cloud platforms like AWS, Azure, and GCP.
- Experience with cloud-native ETL tools such as AWS Glue, Azure Data Factory, Kafka, GCP Dataflow, etc.
- Hands-on experience with data platforms like Redshift, Snowflake, Azure Synapse, and BigQuery.
- Experience in extracting data from SAP or ERP systems is preferred.
- Strong programming skills in Python, Scala, or Java.
- Proficient in SQL and query optimization techniques.
Skills:
- In-depth knowledge of Spark/Scala for high-performance data processing.
- Strong understanding of data modeling, ETL/ELT processes, and data warehousing concepts.
- Familiarity with data governance, security, and compliance best practices.
- Excellent problem-solving, communication, and collaboration skills.
Preferred Qualifications:
- Certifications in cloud platforms (e.g., AWS Certified Data Analytics, Google Professional Data Engineer, Azure Data Engineer Associate).
- Experience with CI/CD pipelines and DevOps practices for data engineering.
- Exposure to Apache Hadoop, Kafka, or other data frameworks is a plus.