As a Site Reliability Engineer (SRE), you’ll continuously drive improvements in observability, performance, and reliability, with the goal to make an impact across the federal government. This role requires a current TS/SCI that has been obtained within the last 51 months and the ability to pass additional background investigations. As a member of this team, you will work onsite at JBAB (Joint Base Anacostia-Bolling) 3 days per week and remotely 2 days.
What you’ll do:
- Monitor platform and containerized applications.
- Identify performance and availability risks and issues.
- Work on the core platform to create and optimize all functions needed to establish a strong platform infrastructure.
- Collaborate with the team and the customer daily
What you’ll need to succeed:
- Minimum of 8 years of software development experience with a minimum of 2 years with Kubernetes and strong understanding of SRE principles for highly scalable and reliable systems.
- Experience implementing proactive alert / monitoring workflows and dashboards based on Kubernetes metrics, logs, and traces using Prometheus, Grafana, Loki, Splunk, or similar technologies.
- Working knowledge of industry best practices with regards to information security.
- Knowledge of clustering, high-availability, replication, and disaster recovery techniques.
- Possess a bachelor's degree and an active TS//SCI clearance (T5 or T5R required).
- Experience working in a DevSecOps environment and with Source Code repositories and CI/CD pipeline solutions such as GitLab, Azure DevOps, GitHub etc.
- Experience with Infrastructure as Code (IaC), containerization, K8, and CI/CD Automation.
- Experience with container orchestration tools (Rancher/RKE2, OpenShift, etc.)
- Ability to work well on a team as well as individually.
- Ability to work in downtown Washington, DC on client site at least 3 days per week.
Nice to haves:
- Passion for learning new development concepts, methodologies, and technologies
- Experience hardening and securing containers
- Previous experience with commercial cloud (e.g. AWS, Azure)
- Can establish and maintain a high level of client trust and confidence with your software development skills
- Can think out of the box to help with troubleshooting issues and providing innovative solutions that fit customers’ needs