About the Job:
Our client, a dynamic and rapidly expanding software company, is seeking a Lead Site Reliability Engineer to join their team. They are at the forefront of empowering all teams to deliver and control the best software, serving trillions of feature flags daily to enhance software delivery and minimize risk for companies of all sizes.
Based in downtown Oakland, our client is experiencing significant growth and requires an individual ready to tackle some of the most challenging engineering problems, including the seamless delivery of feature flags to millions of users worldwide in milliseconds.
As the Lead Site Reliability Engineer, you will play a pivotal role in ensuring the health of our core systems and reliability tooling. Your responsibilities will include responding to and mitigating incidents swiftly, identifying opportunities to enhance the resilience of our core services, and developing force-multiplying capabilities for our internal engineering teams. This will involve fostering a culture of SRE learning and growth while promoting robust code shipping and reliable design practices throughout the software development lifecycle.
Key Technologies: AWS, Golang, CockroachDB, ElasticSearch, Redis, Flink, Kinesis, Terraform.
Responsibilities:
- Lead the development and continuous refinement of SRE tools and processes to enhance software delivery, observability, reliability, and operational efficiency.
- Empower our engineering team to deliver services autonomously, reliably, and efficiently through offerings written in Go and Terraform or delivered through existing tools.
- Define and standardize service health and reliability metrics aligned with business objectives, ensuring transparency and actionable insights.
- Enhance the effectiveness of our incident management lifecycle and drive initiatives to train key roles involved in incident response.
- Collaborate with various team members to define and cultivate our SRE culture through principles, technical frameworks, tooling, and processes.
- Drive the adoption of new technologies, system designs, and best practices in code health, testing, observability, and service maintainability.
- Proactively identify and address potential performance and scalability bottlenecks in our systems and underlying infrastructure.
- Analyze SQL query performance, propose enhancements, and establish guidelines for teams.
Qualifications:
- Demonstrable experience building and operating large-scale, highly available distributed systems.
- Proficiency in server-side web development (e.g., Java/Scala, Ruby, Python, Golang, Node.js) and Infrastructure-as-Code (e.g., Terraform).
- Experience guiding architectural direction and scalability considerations for new projects.
- Strong understanding and proactive management of security practices related to SRE.
- Extensive experience with major cloud providers, observability tooling, and RDBMS technologies.
- Experience leading team ceremonies and driving alignment on decisions with cross-team impact.
- Strong customer focus and ability to make technical decisions aligned with business goals.
- Exceptional communication skills, a positive attitude, and a high degree of empathy.
Pay:
Target pay ranges based on Geographic Zones:
- Zone 1: $183,600 - $235,000
- Zone 2: $165,600 - $212,000
- Zone 3: $156,510 - $200,000
*Compensation may vary based on skills, experience, degree level, and location.
Our client operates with high trust and transparency, providing competitive compensation packages and additional benefits such as RSUs, health insurance, and mental health benefits.
About Our Client:
Our client is dedicated to revolutionizing modern software delivery, empowering developers to innovate faster while ensuring stability and reliability. Their platform enables targeted feature deployment, maximizing the impact of every release and optimizing the user experience. They foster a culture of diversity, inclusion, and collaboration, valuing unique talents and perspectives to drive success.
Our client is an equal opportunity employer and encourages individuals from all backgrounds to apply. They are committed to providing accommodations for applicants with disabilities and ensuring a supportive and inclusive work environment.