P-150
At Databricks, we are obsessed with enabling data teams to solve the world’s toughest problems, from security threat detection to cancer drug development. We do this by building and running the world’s best data and AI infrastructure platform, so our customers can focus on the high value challenges that are central to their own missions.
Founded in 2013 by the original creators of Apache Spark™, Databricks has grown from a tiny corner office in Berkeley, California to a global organization with over 1000 employees. Thousands of organizations, from small to Fortune 100, trust Databricks with their mission-critical workloads, making us one of the fastest growing SaaS companies in the world.
Our engineering teams build highly technical products that fulfill real, important needs in the world. We constantly push the boundaries of data and AI technology, while simultaneously operating with the resilience, security and scale that is critical to making customers successful on our platform.
We develop and operate one of the largest scale software platforms. The fleet consists of millions of virtual machines, generating terabytes of logs and processing exabytes of data per day. At our scale, we regularly observe cloud hardware, network, and operating system faults, and our software must gracefully shield our customers from any of the above.
As a software engineer with a backend focus, you will work closely with your team and product management to prioritize, design, implement, test, and operate micro-services for the Databricks platform and product. This implies, among others, writing software in Scala/Java, building data pipelines (Apache Spark™, Apache Kafka), integrating with third-party applications, and interacting with cloud APIs (AWS, Azure, CloudFormation, Terraform).
Below are some example teams you can join:
Data Science and Machine Learning Infrastructure: Build services and infrastructure at the intersection of machine learning and distributed systems. Our technology empowers the flagship collaborative workspace, notebooks, IDE integrations, and project management products. We also enable machine learning at scale with tools for environment management, distributed training, and managing the Machine Learning lifecycle through MLflow.
Compute Fabric: Build the resource management infrastructure powering all the big data and machine learning workloads on the Databricks platform in a robust, flexible, secure, and cloud-agnostic way. The software manages millions of virtual machines.
Data Plane Storage: Deliver reliable and high performance services and client libraries for storing and accessing humongous amount of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
Enterprise Platform: Offer a simple and powerful experience for onboarding and managing all of their data teams across 10ks of users on the Databricks platform. We do this by building reliable, scalable services and infrastructure with intuitive UIs and by delivering high-impact, cross-cutting projects that drive the "land and expand" strategy for enterprise customers.
Observability: Provide a world class platform for Databricks engineers to comprehensively observe and introspect their applications and services. We build scalable data-intensive infrastructure that processes huge amounts of logs and telemetry. By doing so, we enable teams to become more data-driven and build robust services.
Service Platform: Build high-quality services and manage the services in all environments in a unified way. We provide engineers libraries, tools, services and guidance to develop reliable, scalable, and secure services. We build a unified platform for engineers to deploy and update their services across different clouds and environments.
Core Infra: Build the core infrastructure that powers Databricks, making it available across all geographic regions and Cloud providers. We build highly available distributed systems, heavily utilizing cloud native projects, contributing back whenever possible. We run thousands of Kubernetes clusters across all regions and orchestrate millions of VMs on a daily basis.
Competencies
- BS/MS/PhD in Computer Science, or a related field
- 10+ years of production level experience in one of: Java, Scala, C++, or similar language.
- Comfortable working towards a multi-year vision with incremental deliverables.
- Experience in architecting, developing, deploying, and operating large scale distributed systems.
- Experience working on a SaaS platform or with Service-Oriented Architectures.
- Good knowledge of SQL.
- Experience with software security and systems that handle sensitive data.
- Experience with cloud technologies, e.g. AWS, Azure, GCP, Docker, Kubernetes.