We are currently looking for a Senior Software Engineer to be a part of the Site Reliability Engineering (SRE) team in Atlanta, GA. The SRE team is an innovative team devoted to providing a Docker-based Platform as a Service and assisting a growing number of teams with infrastructure as code questions and needs across a large AWS-focused portfolio.
Atlanta, GA
Hybrid – 2 days in the office
Local candidates only
USC or GC only
Contract to hire
Top 5 Must Haves: Terraform, AWS, Bash scripting, Linux troubleshooting, GitHub Actions, PaaS
About the Position
As a member of the SRE team, you will work with development teams to help create deployment pipelines and infrastructure for a growing landscape of microservices across 50+ AWS accounts. The platform the SRE team has built and supports continues to evolve to expand a growing number of AWS services, allowing the dev teams we work with to focus on building Docker containers, largely in .NET). As we continue to evolve, we are also working to remove the last of our on-prem presence and supporting refactoring and improvements to the Company's logistics availability and resiliency. We are looking for engineers who are passionate about infrastructure as code, especially Terraform, and continuous deployment to build scalable and highly reliable applications.
If you love to figure out how all the pieces are put together and if automation and building tools to monitor and manage your applications sounds interesting to you, we want to talk to you.
As a Site Reliability Engineer you:
- Have a consultant's mindset - you LISTEN, and lead your customers into the goodness of SRE practices
- Have incredible communication skills -- you put the cookies where all the kids can get to them
- Are strongly averse to clicking in a UI, and only ever do so with one hand while pinching your nose with the other and grimacing
- Recognize your obligation to the team to elevate the practice of SRE to the highest standard possible
- Have an irresistible compulsion to automate EVERYTHING - testing, deploying, monitoring, etc.
- Display a strong sense of ownership for the work you produce
- Have been accused of muttering "Everything As Code" in your sleep and have a "Cattle, Not Pets" bumper sticker on your car
- Improve predictability and reliability of software releases, workflows, and operating software.
- Reduce complexity and streamline delivery by participating in the creation of and promoting the use of reusable code, tooling, systems, and solution patterns
- Are self-aware. You realize we are all growing together, and we want to help each other do exactly that -- you win when we win
Qualifications:
- Bachelors degree in a related discipline and 4+ years' experience in a related field. The right candidate could also have a different combination, such as Masters degree and 2 years' experience, P.h.D. and up to 1 year of experience; or 8 years' experience in a related field.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems
- Ability to debug, optimize code, and automate routine tasks
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive
- Understanding of Linux operating systems and Docker containerization
- Strong experience in Terraform or another IaC language for managing infrastructure and environments at scale
- Fluent coding experience with at least one of these languages: Python, .NET, Ruby
- Experience rolling out highly available, mission-critical applications
- Experience with version control systems (Git specifically) and trunk-based branching strategies
- Experience designing, deploying, and supporting solutions in AWS
- Experience with CI/CD systems (GitHub Actions, AWS CodePipeline/CodeBuild/CodeDeploy)
- Experience with database infrastructure (RDS, Aurora, MySQL, Postgres, DynamoDB)
- Excellent written communication, problem solving, and process management skills
- Desire to work in a fast paced, evolving, growing, dynamic environment. Everyone says this, but here, it really is.