About the role
Are you interested in contributing to the development of Roku’s next-generation observability platform? We are seeking individuals with expertise in the CNCF open-source ecosystem of observability tools for metrics, logs, and tracing. If you enjoy collaborating with cross-functional teams to enhance their observability experience and managing large-scale operations involving high-volume data and requests across multiple regions and clusters, we encourage you to apply for this role.
For California Only - The estimated annual salary for this position is between $186,000 - $388,000 annually.
Compensation packages are based on factors unique to each candidate, including but not limited to skill set, certifications, and specific geographical location.
This role is eligible for health insurance, equity awards, life insurance, disability benefits, parental leave, wellness benefits, and paid time off.
About the team
The observability team is an integral part of Roku’s central Infrastructure Engineering team, which oversees the service mesh hosting architecture and observability platform that lives on that platform. Together, we are tasked with developing and scaling both the Platform (Kubernetes, Istio, Envoy, operators, etc.) and the Observability stack (OSS/CNCF-supported observability projects). Our goal is to facilitate Roku’s shift towards a unified, cloud-agnostic infrastructure where all teams benefit from a common framework with out-of-the-box features.
Within the observability team, we are dedicated to creating a world-class observability platform. We customize and optimize OSS projects to meet our needs and actively contribute to upstream projects, promoting positive changes and engaging with the broader ecosystem. We even write software ourselves when there isn’t a good OSS option.
What you’ll be doing:
- Work closely with the Service Mesh team to identify and standardize existing and new observability tools as part of a holistic solution
- Work on, enhance, and expand our diverse stack of components that operate across multiple clouds, regions, and clusters, managing all observability data. You will have the freedom and tools to drive improvements and make changes
- Perform feature/functionality/usability trials of new observability tools that can benefit Roku
- Contribute new open-source tools and improvements to existing open-source tools back to the CNCF ecosystem
- Design and build automation and custom features in and around the chosen tools to make onboarding new services easy, improve UIX and the general experience for developers
- Demonstrate great communication skills in working with technical and non-technical audiences
We’re excited if you have:
- 8+ years of experience in Infrastructure engineering, DevOps, and Software Engineering
- Recent experience designing and building unified observability platforms that enable companies to use the sometimes overwhelming amount of available data (metrics, logs, and traces) to determine quickly if their application or service is operating as desired
- Expertise in deploying and using open-source observability tools in large-scale environments, including Prometheus, Grafana, Loki, Tempo, Thanos, or similar tools such as Cortex, Mimir, ELK (Elasticsearch/Logstash/Kibana) stack, etc.
- Expertise in at least one of the observability pillars; (distributed) tracing, logs, metrics, profiling/APM
- Familiarity with the open standard OpenTelemetry
- Familiarity with Kubernetes and Istio as the architecture on which the observability platform runs, and how they integrate and scale. Additionally, the ability to contribute improvements back to the joint platform for the benefit of all teams.
- Demonstrated customer engagement and collaboration skills to curate custom dashboards and views, and identify and deploy new tools, to meet their requirements
- The drive and self-motivation to understand the intricate details of a complex infrastructure environment
- Hands-on experience working with AWS and/or GCP
- Experience with Go
- B.S. or M.S. degree in Computer Science, Engineering, or equivalent experience
#LI-SR2