Principal Software Engineer - Data Pipelines

Sparc The World • Remote (United States) • 4w ago

Our client is seeking a dynamic Principal Software Engineer specialized in Data Pipelines. Joining our team, you'll be immersed in an environment that fosters creativity, innovation, and teamwork. We're committed to your growth and success, providing opportunities for you to make impactful choices and thrive in your career journey.

Responsibilities:

Architect and develop large-scale, distributed data processing pipelines utilizing cutting-edge technologies such as Apache Spark, Apache Beam, and Apache Airflow for orchestration.
Design and implement efficient data ingestion, transformation, and storage solutions catering to both structured and unstructured data.
Collaborate closely with Engineering Leaders, Architects, and Product Managers to grasp business requirements and offer technical solutions within a broader roadmap.
Build and optimize real-time and batch data processing systems, ensuring robustness, fault tolerance, and scalability.
Work alongside data engineers, analysts, and scientists to comprehend business needs and translate them into robust technical solutions.
Implement and uphold best practices for data governance, data quality, and data security across the data lifecycle.
Act as a mentor and guide for junior engineers, cultivating a culture of continuous learning and knowledge sharing.
Stay abreast of the latest trends, technologies, and industry best practices in big data and data engineering domains.
Engage in code reviews, design discussions, and contribute to technical decision-making processes.
Contribute to the development and maintenance of CI/CD pipelines, ensuring smooth and reliable deployments.
Collaborate effectively with cross-functional teams to ensure the successful delivery of projects and initiatives.

Requirements:

Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
Minimum of 10 years of experience in backend software development, with a strong focus on data engineering and big data technologies.
Expertise in Apache Spark, Apache Beam, and Airflow, with a deep understanding of distributed computing and data processing frameworks.
Proficiency in Java, Scala, and SQL, with the ability to write clean, maintainable, and efficient code.
Experience building enterprise-grade software in a cloud-native environment (GCP or AWS) leveraging cloud services such as GCS/S3, Dataflow/Glue, Data proc/EMR, Cloud Function/Lambda, Big Query/Athena, Big Table/Dynamo.
Familiarity with cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes).
Hands-on experience with stream/data processing technologies like Kafka, Spark, Google BigQuery, Google Dataflow, HBase.
Knowledge of designing CI/CD pipelines with tools like Jenkins, Github Actions, or similar.
Experience with SQL, particularly in performance optimization, and familiarity with Graph and Vector database or processing frameworks.
Strong understanding of data modeling, data warehousing, and data integration best practices.
Exposure to streaming data processing, real-time analytics, and machine learning pipelines.
Excellent problem-solving, analytical, and critical thinking skills.
Effective communication and collaboration skills, with a proven ability to work efficiently in a team environment.
Experience in mentoring and leading technical teams.

The compensation package for this role includes a base salary range of $175,000 - $210,000 annually, alongside variable compensation and comprehensive benefits. Actual compensation offered will be based on factors such as location, qualifications, skills, and experience.

Our client is committed to fostering an inclusive workplace that values diversity and equal opportunities for all. We encourage applicants from diverse backgrounds to apply.

Note: This job description is presented for illustrative purposes and is not associated with any specific company.