SRE Cloud Engineer

USA01 • Madrid, Spain • 1w ago

Description

We’re passionate about building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions. We are currently seeking a public cloud experienced engineer for planning, designing and implementing next generation cloud infrastructure solutions. Cloud Engineer will be a part of the Engineering team and will require a strong knowledge of application monitoring, infrastructure monitoring, automation, maintenance, and Service Reliability Improvements.

Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.

Reasons why you would love joining Ford:

Belong to a leading company in the Automotive industry globally with more than 120 years in the market
Full-time fixed contract, with a competitive starting compensation and a benefits package (restaurant card, discounts, etc.)
Work-life balance: 33 vacation days and work under a hybrid model (2/3 days a week)
Career Development path – being part of high-impact projects which would allow you to improve your technical skills and develop.

Responsibilities

Design, automate and manage a highly available and scalable cloud deployment that allows development teams to deploy and run their services.
Collaborating with engineering and Architects teams to evaluate and identify optimal cloud solutions, also leveraging scalability, high-performance and security.
Modernise existing on-prem solution and improving existing systems.
Extensively automated deployments and managed applications in GCP.
Developing and maintaining cloud solutions in accordance with best practices.
Ensuring efficient functioning of data storage and processing functions in accordance with company security policies and best practices in cloud security.
Collaborate with Engineering teams to identify optimization strategies, help develop self-healing capabilities
Experience in developing a strong observability capabilities
Identifying, analysing, and resolving infrastructure vulnerabilities and application deployment issues.
Regularly reviewing existing systems and making recommendations for improvements.

Qualifications

The candidate should possess a strong understanding of effective data visualization techniques, choosing the right chart types for different data, and creating clear and informative dashboard
Familiarity with the DORA metrics (Deployment Frequency, Lead Time for Changes, Change Failure Rate, Time to Restore Service) and their importance in assessing DevOps performance is essential
Understanding how CI/CD events integrate with observability is vital. They should know how to correlate events from CI/CD pipelines with performance and error data
The engineer should have experience working with RESTful APIs, both consuming and potentially building them. This is crucial for integrating with various services and collecting data
Understanding API specifications (Swagger/OpenAPI) is beneficial for interacting with various services.
Building and maintaining dashboards in Grafana, connecting to various data sources (including GCP services).
Implementing and managing SLOs using Nobl9
Optimizing BigQuery queries for performance
Automating data collection and processing using Cloud Scheduler and other GCP services
Proficiency in using Dynatrace for application performance monitoring (APM) is critical. This includes setting up monitoring, analyzing performance bottlenecks, and creating dashboards. They should understand Dynatrace's data model and its capabilities
Experience with Nobl9 for reliability management is a valuable asset. This shows an understanding of SRE principles and the ability to define and monitor SLOs (Service Level Objectives) and error budgets.
Strong Grafana skills are essential for dashboard creation and visualization. They should be able to build interactive dashboards, create custom visualizations, and connect to various data sources
Understanding Prometheus's role in metrics collection and its architecture is important. This often involves configuring exporters and understanding how to query and analyze metrics
Deep understanding of the OpenTelemetry Collector's configuration, pipelines, and processing capabilities is crucial. This includes choosing appropriate exporters and processors based on the environment and requirements
Proven work experience in Docker
Proven working experience in API gateway, Apigee is an advantage
Experience in package, config and deployment management via Helm, Kustomize, ArgoCD.
Strong knowledge in Github, DevOps (Cloud Build and Deploy is an advantage)
Should be proficient in scripting and coding, that include traditional languages like Python, GoLang,Java, JS and Node.js
Exposure to Cloud Monitoring and logging
Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
Experience with automation tools should be a priority
Languages: English advanced/high level needed