Description3 Days Hybrid from any of our locations in RI, NJ,GA, MA, NC, TX or AZ
Role is not relocation eligible.
Principal Site Reliability Engineer - Observability/AIOps
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures internally critical and externally visible systems have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance. SRE is a mindset, and a set of engineering approaches focused on optimizing existing systems, building infrastructure, and eliminating work through automation. As a Site Reliability Engineer with focus on observability you will build and operate next generation observability platforms.
As an SRE with Observability focus you will:
● Explore the complex IT estates of our clients to understand their observability/AIOps opportunities
● Collaborate to architect unified observability and AIOps strategies which employ leading AI technology
● Implement enterprise observability/AIOps technology and processes
● Amplify observability/AIOps outcomes by accelerating adoption across technology and business organizations
Responsibilities include:
● Developing API-driven micro-services that combine into large and complex platforms
● Planning and executing highly parallel distributed object storage transformations and migrations
● Maintaining automated test suites using CI/CD tools
● Participating in collaborative projects with small software engineering teams
● Develop automation, processes, and tools designed to make our services simpler and more robust
● Participate in troubleshooting, capacity planning and analysis, performance analysis activities
● Advise management on service onboarding strategies and execution
Critical Hiring Criteria
What we are looking for:
● Entrepreneurs who seek challenging problems to solve
● Creativity, initiative and acute attention to detail
● Thirst for innovation and solving problems at lightning speed
● Passion for automating everything repetitive
● Obsession with software scalability and performance under high loads
● Love for using and contributing to open-source software
Please bring to the table:
● Development experience, comfortable working in multiple languages(Python, Java, Go and Ruby a plus)
● Experience working in collaborative coding environments (peer review, continuous integration, etc)
● 7+ years of application development
● Experience working in distributed remote teams across multiple time zones
● Experience in large scale operations environments
● 7+ years of experience with Linux/Unix development or systems administration
● 3+ years of experience with networking systems and technologies
● Deep understanding of network performance and security
● Ability to identify tasks which require automation and implement required automation
● Configuration Management tools experience with Puppet, Chef, SaltStack
● Hands-on operational experience in a high-volume or critical production service environment - distributed systems, capacity planning, continuous deployment
● BA/BS in Computer Science preferred, or equivalent experience (advanced degrees preferred)
We have opportunities to work with and learn:
● Object Storage - Minio/S3/etc
● Data Collection - OpenTelemetry/Grafana Alloy/etc
● Message Bus - Kafka/NSQ/etc
● Scaling Databases - Druid/Clickhouse/Cassandra/etc
● Relational database technologies at large scale - Timescale/Vitess/Postgres/etc
● Scheduling & Orchestration - Kubernetes/OpenShift/Docker
● Cloud Platforms - AWS/Azure
Pay Transparency
The salary range for this position is $ 130,720 - $ 196,080 per year plus an opportunity to earn an annual discretionary bonus. Actual pay is based on various factors including but not limited to the work location, and relevant skills and experience.
We offer competitive pay, comprehensive medical, dental and vision coverage, retirement benefits, maternity/paternity leave, flexible work arrangements, education reimbursement, wellness programs and more. Note, Citizens’ paid time off policy exceeds the mandatory, paid sick or paid time-away policy of very local and state jurisdiction in the United States. For an overview of our benefits, visit https://jobs.citizensbank.com/benefits.