Skills: Big Data, PySpark, BigQuery, cloud platforms (Google Cloud, AWS, or Azure), Linux, Shell/Python, Ansible
Overview:
This role involves managing and optimizing Big Data environments (PySpark, BigQuery, Airflow) in Google/AWS/Azure cloud, ensuring efficient, secure, and cost-effective operations. Responsibilities include 24x7 support, data pipeline optimization, automation, and troubleshooting, with a focus on DevOps, CI/CD, and disaster recovery.
Roles and Responsibilities:
(Google/AWS/Azure public cloud, PySpark, BigQuery, and Google Airflow)
- Participate in 24x7x365 SAP Environment rotational shift support and operations.
- As a team lead, you will be responsible for maintaining the upstream Big Data environment, where millions of financial data transactions flow daily, consisting of PySpark, BigQuery, Dataproc, and Google Airflow.
- You will be responsible for streamlining and tuning existing Big Data systems and pipelines and building new ones. Making sure the systems run efficiently and with minimal cost is a top priority.
- Manage the operations team in your respective shift, making changes to the underlying systems.
- This role involves providing day-to-day support, enhancing platform functionality through DevOps practices, and collaborating with application development teams to optimize database operations.
- Architect and optimize data warehouse solutions using BigQuery to ensure efficient data storage and retrieval.
- Install, build, patch, upgrade, and configure Big Data applications.
- Manage and configure BigQuery environments, datasets, and tables.
- Ensure data integrity, accessibility, and security in the BigQuery platform.
- Implement and manage partitioning and clustering for efficient data querying.
- Define and enforce access policies for BigQuery datasets.
- Implement query usage caps and alerts to avoid unexpected expenses.
- Should be very comfortable with troubleshooting Linux-based systems on issues and failures with a good grasp of the Linux command line.
- Create and maintain dashboards and reports to track key metrics like cost and performance.
- Integrate BigQuery with other Google Cloud Platform (GCP) services like Dataflow, Pub/Sub, and Cloud Storage.
- Enable BigQuery through tools like Jupyter Notebook, Visual Studio Code, and other CLIs.
- Implement data quality checks and data validation processes to ensure data integrity.
- Manage and monitor data pipelines using Airflow and CI/CD tools (e.g., Jenkins, Screwdriver) for automation.
- Collaborate with data analysts and data scientists to understand data requirements and translate them into technical solutions.
- Provide consultation and support to application development teams for database design, implementation, and monitoring.
- Proficiency in Unix/Linux OS fundamentals, Shell/Perl/Python scripting, and Ansible for automation.
- Disaster Recovery & High Availability expertise, including backup/restore operations.
- Experience with geo-redundant databases and Red Hat cluster.
- Accountable for ensuring that delivery is within the defined SLA and agreed milestones (projects) by following best practices and processes for continuous service improvement.
- Work closely with other Support Organizations (DB, Google, PySpark data engineering, and Infrastructure teams).
- Incident Management, Change Management, Release Management, and Problem Management.