Sr Site Reliability Engineer (Local to WA) -W2

eTek IT Services • Seattle, Washington, United States • 3w ago

Overview:

The Senior Site Reliability Engineer plays a critical role in ensuring the reliability, scalability, and performance of our systems and services. They are responsible for designing and implementing tools and automated solutions to improve system reliability, monitoring, and incident response.

Key Responsibilities:

Develop and maintain infrastructure as code using tools like Terraform and Ansible
Implement and maintain monitoring, alerting, and reporting systems
Collaborate with cross-functional teams to improve system reliability and performance
Perform system capacity planning and demand forecasting
Automate routine operational tasks and processes
Participate in incident response and on-call rotation
Optimize the performance and efficiency of various systems and platforms
Conduct system failure analysis and provide root cause analysis
Implement and manage CI/CD pipelines
Conduct periodic performance and security audits
Lead efforts to improve overall system architecture
Troubleshoot and resolve complex technical issues
Collaborate with development teams to improve application deployment processes
Ensure compliance with security and data protection best practices

Required Qualifications:

Bachelor’s degree in Computer Science, Engineering, or a related field
6+ years of experience in a site reliability engineering or related role
Strong experience with Linux system administration and troubleshooting
Proficiency in scripting and programming languages such as Python, Shell, or Go
Experience with automation and configuration management tools like Puppet, Chef, or Ansible
Solid understanding of networking concepts and protocols
Expertise in cloud computing platforms such as AWS, Azure, or GCP
Proven track record of designing and implementing scalable, reliable, and maintainable systems
Experience with containerization and orchestration tools like Docker and Kubernetes
Knowledge of continuous integration and continuous deployment (CI/CD) practices and tools
Excellent problem-solving and troubleshooting skills
Strong communication and collaboration abilities
Relevant certifications such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator, or similar
Ability to work effectively in a fast-paced, dynamic environment
Experience with incident management and on-call support