DescriptionWe’re passionate about building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions. We are currently seeking a public cloud experienced engineer for planning, designing and implementing next generation cloud infrastructure solutions. Cloud Engineer will be a part of the Engineering team and will require a strong knowledge of application monitoring, infrastructure monitoring, automation, maintenance, and Service Reliability Improvements.
Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
Reasons why you would love joining Ford:
- Belong to a leading company in the Automotive industry globally with more than 120 years in the market
- Full-time fixed contract, with a competitive starting compensation and a benefits package (restaurant card, discounts, etc.)
- Work-life balance: 33 vacation days and work under a hybrid model (2/3 days a week)
- Career Development path – being part of high-impact projects which would allow you to improve your technical skills and develop.
Responsibilities
- Design, automate and manage a highly available and scalable cloud deployment that allows development teams to deploy and run their services.
- Collaborating with engineering and Architects teams to evaluate and identify optimal cloud solutions, also leveraging scalability, high-performance and security.
- Modernise existing on-prem solution and improving existing systems.
- Extensively automated deployments and managed applications in GCP.
- Developing and maintaining cloud solutions in accordance with best practices.
- Ensuring efficient functioning of data storage and processing functions in accordance with company security policies and best practices in cloud security.
- Collaborate with Engineering teams to identify optimization strategies, help develop self-healing capabilities
- Experience in developing a strong observability capabilities
- Identifying, analysing, and resolving infrastructure vulnerabilities and application deployment issues.
- Regularly reviewing existing systems and making recommendations for improvements.
Qualifications
- The candidate should possess a strong understanding of effective data visualization techniques, choosing the right chart types for different data, and creating clear and informative dashboard
- Familiarity with the DORA metrics (Deployment Frequency, Lead Time for Changes, Change Failure Rate, Time to Restore Service) and their importance in assessing DevOps performance is essential
- Understanding how CI/CD events integrate with observability is vital. They should know how to correlate events from CI/CD pipelines with performance and error data
- The engineer should have experience working with RESTful APIs, both consuming and potentially building them. This is crucial for integrating with various services and collecting data
- Understanding API specifications (Swagger/OpenAPI) is beneficial for interacting with various services.
- Building and maintaining dashboards in Grafana, connecting to various data sources (including GCP services).
- Implementing and managing SLOs using Nobl9
- Optimizing BigQuery queries for performance
- Automating data collection and processing using Cloud Scheduler and other GCP services
- Proficiency in using Dynatrace for application performance monitoring (APM) is critical. This includes setting up monitoring, analyzing performance bottlenecks, and creating dashboards. They should understand Dynatrace's data model and its capabilities
- Experience with Nobl9 for reliability management is a valuable asset. This shows an understanding of SRE principles and the ability to define and monitor SLOs (Service Level Objectives) and error budgets.
- Strong Grafana skills are essential for dashboard creation and visualization. They should be able to build interactive dashboards, create custom visualizations, and connect to various data sources
- Understanding Prometheus's role in metrics collection and its architecture is important. This often involves configuring exporters and understanding how to query and analyze metrics
- Deep understanding of the OpenTelemetry Collector's configuration, pipelines, and processing capabilities is crucial. This includes choosing appropriate exporters and processors based on the environment and requirements
- Proven work experience in Docker
- Proven working experience in API gateway, Apigee is an advantage
- Experience in package, config and deployment management via Helm, Kustomize, ArgoCD.
- Strong knowledge in Github, DevOps (Cloud Build and Deploy is an advantage)
- Should be proficient in scripting and coding, that include traditional languages like Python, GoLang,Java, JS and Node.js
- Exposure to Cloud Monitoring and logging
- Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
- Experience with automation tools should be a priority
- Languages: English advanced/high level needed
Additional Information
Ford is committed to diversity and equality of opportunity for all and is opposed to any form of less favourable treatment or harassment on the grounds of gender, marital status, civil partnership status, parental status, race, ethnic origin, colour, nationality, national origin, disability, sexual orientation, religion/belief, gender reassignment and gender identity, age and those with caring responsibilities.
#LI-Hybrid