System Administrator

Lumiere Systems • San Jose, CA, US • 3d ago

Skills: Big Data, PySpark, BigQuery, cloud platforms (Google Cloud, AWS, or Azure), Linux, Shell/Python, Ansible

Overview:

This role involves managing and optimizing Big Data environments (PySpark, BigQuery, Airflow) in Google/AWS/Azure cloud, ensuring efficient, secure, and cost-effective operations. Responsibilities include 24x7 support, data pipeline optimization, automation, and troubleshooting, with a focus on DevOps, CI/CD, and disaster recovery.

Roles and Responsibilities:

(Google/AWS/Azure public cloud, PySpark, BigQuery, and Google Airflow)

Participate in 24x7x365 SAP Environment rotational shift support and operations.
As a team lead, you will be responsible for maintaining the upstream Big Data environment, where millions of financial data transactions flow daily, consisting of PySpark, BigQuery, Dataproc, and Google Airflow.
You will be responsible for streamlining and tuning existing Big Data systems and pipelines and building new ones. Making sure the systems run efficiently and with minimal cost is a top priority.
Manage the operations team in your respective shift, making changes to the underlying systems.
This role involves providing day-to-day support, enhancing platform functionality through DevOps practices, and collaborating with application development teams to optimize database operations.
Architect and optimize data warehouse solutions using BigQuery to ensure efficient data storage and retrieval.
Install, build, patch, upgrade, and configure Big Data applications.
Manage and configure BigQuery environments, datasets, and tables.
Ensure data integrity, accessibility, and security in the BigQuery platform.
Implement and manage partitioning and clustering for efficient data querying.
Define and enforce access policies for BigQuery datasets.
Implement query usage caps and alerts to avoid unexpected expenses.
Should be very comfortable with troubleshooting Linux-based systems on issues and failures with a good grasp of the Linux command line.
Create and maintain dashboards and reports to track key metrics like cost and performance.
Integrate BigQuery with other Google Cloud Platform (GCP) services like Dataflow, Pub/Sub, and Cloud Storage.
Enable BigQuery through tools like Jupyter Notebook, Visual Studio Code, and other CLIs.
Implement data quality checks and data validation processes to ensure data integrity.
Manage and monitor data pipelines using Airflow and CI/CD tools (e.g., Jenkins, Screwdriver) for automation.
Collaborate with data analysts and data scientists to understand data requirements and translate them into technical solutions.
Provide consultation and support to application development teams for database design, implementation, and monitoring.
Proficiency in Unix/Linux OS fundamentals, Shell/Perl/Python scripting, and Ansible for automation.
Disaster Recovery & High Availability expertise, including backup/restore operations.
Experience with geo-redundant databases and Red Hat cluster.
Accountable for ensuring that delivery is within the defined SLA and agreed milestones (projects) by following best practices and processes for continuous service improvement.
Work closely with other Support Organizations (DB, Google, PySpark data engineering, and Infrastructure teams).
Incident Management, Change Management, Release Management, and Problem Management.