We are looking for a technical expert in building out and refining the DevOps discipline within an enterprise SaaS environment, someone who is focused on continuous integration, continuous deployment and promoting the productivity of all of Engineering. As a Site Reliability Engineer, you will be building, evolving, and operating the infrastructure automation platform used to power our Services. You will need to ensure that our production environment is operating and performing optimized and efficiently; and that software is released and deployed in an efficient and streamlined manner, from development all the way to production. This is a hands-on operational role with a balanced amount of tool and infrastructure development, including advanced scripting and automation. You will be supporting our systems/ tools/ processes infrastructure – on premises or external cloud, and support the entire stack for our service offering.
Responsibilities:
- Champion the Site Reliability (DevOps) needs for
continuous integration and continuous deployment while maintaining focus on
Quality of Service.
- Work with Infrastructure architects to deploy and
operate cloud services and related projects from development to production
- Lead the efforts to improve the existing server and
configuration management automation, identify opportunities to improve overall
productivity and investigate tools that might speed up the process or make us
more efficient in continuous integration continuous deployment. Assist in the
roll-out and deployment of new product features and services to facilitate our
rapid iteration and constant growth. making it faster and easier to create and
deploy software.
- Bridge Engineering and core shared operations
services
- Maintain consistent system performance. This means
being up and available, as well as fast and reliable. Participate in
troubleshooting, capacity planning and analysis, performance analysis,
infrastructure improvement activities
- Troubleshoot issues across the whole stack -
hardware, software, applications and network.
- Take part in a 24x7 on-call rotation.
Required Qualifications:
- Assertiveness & creative ideas are mandatory.
- Excellent interpersonal skills suitable for user
support, including the ability to lead projects with peer-level
engineers/managers.
- Exceptional communication skills – both written and
oral (one-on-one and group).
- Strong analytical, problem-solving, and
decision-making skills.
- Must have self-starting personality, unafraid to
display initiative and innovation on the job.
- Solid understanding and experience working with high
availability, high performance, multi-data center systems.
- Three or more years’ experience in supporting
internet fronted infrastructure with Cloud, building and running large-scale
web production systems, troubleshooting problems as well as improving the
reliability of systems.
- Experience with scaling services out horizontally as
well as vertically. Build, monitor, troubleshoot and manage production,
testing and development environments.(VMWare ESX or other equivalent
hypervisor)
- Knowledge on Cloud-based services - preferably AWS
including but not limited to: EC2, S3, Cloudfront, EBS, SQS, etc.
- Extensive knowledge of Windows and Unix/Linux systems
including hardware, software and applications.
- Working knowledge of configuration management tools to help you manage software and system changes repeatedly and predictably (Puppet,Chef, Docker, Salt, Automated Testing).
- Experience with system deployment and automation with
scripting like Python, Shell, or Perl.
- Some experience in analysis and building metrics
gathering systems (Splunk and AppDynamics)
- Knowledge of build/continuous integration tools
(Hudson, Jenkins).
- BS/BA (4 yr) or higher in Computer Science or a related field.
Desired Qualifications:
- Ability to manage multiple projects with competing
priorities.
- A track record of maintaining and improving skills in
existing and emerging open source technologies through training or
self-research.
- Love to learn, enjoy troubleshooting and thinking
through complicated problems.
- Ability to train others on new technologies and
processes.
- Ability to manage time effectively in a fast-paced,
customer-focused, changing environment.
- Comfortable with collaboration, open communication
and reaching across functional borders.
- Willingness and ability to maintain a positive,
quality-oriented, reliable, and flexible attitude.
- Willingness and ability to do what it takes to achieve objectives, including off-hours support or tasks.