Site Reliability Engineer (Local to Bay Area)- W2 Contract

eTek IT Services • United States • 3w ago

Overview

The Site Reliability Engineer plays a crucial role in ensuring the reliability, performance, and scalability of the infrastructure and applications. This role is vital in maintaining a seamless and efficient operation of technology systems within the organization, and ensuring that they meet the high standards of availability and performance required by both internal and external users.

Key responsibilities

Design and implement automation for various processes to improve efficiency and reliability
Develop monitoring solutions to ensure the health and performance of systems
Participate in on-call rotations and handle incident response, troubleshooting and resolution
Create and maintain scripts for operational tasks and automation
Conduct capacity planning and manage the scalability of the systems
Collaborate with development teams to improve system reliability and performance
Deploy and maintain cloud services and infrastructure
Define and implement service level objectives and indicators
Ensure security best practices are followed in all aspects of infrastructure and services
Perform system and application performance tuning and capacity forecasting
Conduct post-incident reviews and implement preventive measures
Participate in the design and implementation of disaster recovery plans
Document procedures, configurations, and processes
Contribute to the continuous improvement of processes and tools
Stay updated with industry trends and best practices

Required qualifications

Bachelor's degree in Computer Science, Engineering, or a related field
Proven experience in a Site Reliability Engineer or similar role
Strong understanding of software development, system administration, and networking
Proficiency in scripting (e.g., Python, Shell, Perl)
Experience with monitoring and alerting tools (e.g., Nagios, Datadog, Prometheus)
Expertise in cloud services and infrastructure (e.g., AWS, GCP, Azure)
Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes)
Experience with CI/CD pipelines and configuration management tools (e.g., Jenkins, Ansible)
Solid understanding of TCP/IP, HTTP, DNS, and other network protocols
Ability to analyze and troubleshoot complex systems and applications
Experience with incident management and on-call responsibilities
Familiarity with security best practices and tools
Excellent communication and collaboration skills
Certifications such as AWS Certified SysOps Administrator or Google Professional Cloud DevOps Engineer is a plus
Continuous learning and self-improvement mindset