Job Duties
As a site reliability engineer, you will be focused on maximum availability, observability, reliability, security, and performance for Digital Experiences.
SREs perform deep problem analysis, detect infrastructure or code defects, define, report, and create observability processes for Key Performance Indicators (KPIs), and work with product delivery teams to provide long term solutions to production issues.
We are looking for talented and passionate full stack developers with knowledge of datacenter infrastructure and cloud platforms who can bring the following:
- Ability to observe, diagnose, and develop fixes for production issues quickly and efficiently
- Ability to develop and drive real time monitoring solutions that provide visibility into site health and key performance indicators
- Strong communication skills (written and verbal). They must be able to clearly articulate issues and their impact(s)
- Working understanding of IT service management (Incident, Problem, Change and Knowledge management)
- Ability to work across teams (business and technical) to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas in need of optimization
- Practical experience in application reliability practices/production support for consumer facing web and/or mobile experiences or a strong technical skillset combined with a desire to learn