Job Description:
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.
One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We’re devoted to being a diverse and inclusive workplace for everyone. We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.
Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.
Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us!
Job Description:
This job is responsible for partnering with engineering and technology teams to improve reliability and observability for the services it supports. Key responsibilities include planning and implementing instrumentation, tooling, ticketing, alerting and on call routines as defined in observability designs, and engaging in production triage and Problem Management. Job expectations include supporting code enhancements to automate services and improve reliability and observability while expanding knowledge to identify gaps in the observability design or implementation.
We are seeking a Database Site Reliability Engineer (SRE) to join our team developing automations to reduce manual effort (toil) and creating data visualizations to give insights into our database operational estates.
Responsibilities:
- Develops software or system scripts to simplify or eliminate the dependence on human intervention for recurring tasks
- Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and to help define solutions to reduce manual support effort and/or improve system reliability
- Partners with solutions engineers and application teams to implement the necessary code changes to make use of common reliability libraries and tools and help Production Support and Application Development teammates understand how to use them
- Engages as a subject matter expert (SME) in Incident triage efforts, failure scenario modeling and works with Problem Manager to diagnose root causes for incident / problem management investigation
- Contributes to a catalog of reliability tools and libraries that can be leveraged for common instrumentation, automation, and operational needs by operational needs by Application and Database Services Support teams
- Leverages guidance from the team and works with monitoring tools and Application Development teams to establish/enhance monitoring/observability solutions and plans created to support continuous improvement efforts
- Develop tools, automations and self-healing to improve the operational reliability and scalability of our global database infrastructure; reducing toil and increasing efficiency
- Develop metrics and dashboards to increase observability into our operational estates
- Produce clear documentation both for end user consumption and for support of your tools, automations and dashboards
- Leverage your cross product view of databases to drive standardization of processes across database operational teams
- Collaborate with our database operations and engineering teams to achieve the best outcomes across all lines of business
- Identify sources of instability and drive operational excellence
Required Qualifications:
- 3+ years’ experience managing databases including PostgreSQL and at least one of MongoDB, Oracle or SQL Server
- 3+ years’ experience with programming languages including SQL and one of Shell scripting, PowerShell or Python
- Excellent written communication, problem solving, process management, and collaborative skills to work with teams across the organization
- Good understanding of SDLC, agile methodologies and tooling including JIRA and Bitbucket
- Ability to learn new technologies
Desired Qualifications:
- Web UI development experience, e.g. Django, ASP.NET
- Knowledge of Ansible
- Tableau dashboard development experience
- Experience of testing tools, e.g. Octane
Skills:
- Analytical Thinking
- Application Development
- Automation
- Production Support
- Result Orientation
- Adaptability
- Collaboration
- DevOps Practices
- Solution Delivery Process
- Technical Strategy Development
- Influence
- Innovative Thinking
- Risk Management
- Solution Design
- Stakeholder Management
Shift:
1st shift (United States of America)
Hours Per Week:
40
Pay Transparency details
US - NJ - Jersey City - 101 Hudson St - 101 Hudson (NJ2101)Pay and benefits informationPay range$95,000.00 - $155,500.00 annualized salary, offers to be determined based on experience, education and skill set.Discretionary incentive eligibleThis role is eligible to participate in the annual discretionary plan. Employees are eligible for an annual discretionary award based on their overall individual performance results and behaviors, the performance and contributions of their line of business and/or group; and the overall success of the Company.BenefitsThis role is currently benefits eligible. We provide industry-leading benefits, access to paid time off, resources and support to our employees so they can make a genuine impact and contribute to the sustainable growth of our business and the communities we serve.