Software Engineer - Systems PhD Candidates

Databricks • Seattle, Washington, United States • 1w ago

At Databricks, we are passionate about helping data teams solve the world's toughest problems — from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the world's best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers — and customer obsessed — we leap at every opportunity to solve technical challenges, from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And we're only getting started.

We're radically simplifying the entire data lifecycle, from ingestion to generative AI and everything in-between. We’re doing it cross-cloud with a unified platform, currently serving over 10k customers, processing exabytes of data/day on 15+ million VMs, and growing exponentially.

To make it happen we’re building multi-cloud systems at every corner of the data ecosystem, from query engines, vector databases, training pipelines, and storage systems, down to the infrastructure that allows them to scale like auto-sharders, caches, and load balancers, just to name a few. We also build and support the tooling, languages, and stacks that bring it together. Basically, we do it all.

The space we work in and the problems we solve are massive, complex, and very deep (our published work on Lakehouse, Delta lake, and Photon are a testament to that). We’re looking for practitioners who are eager to work with the best in industry to push the boundaries of what’s possible for our customers. If you’re truth seeking, data driven, and love to operate from first principles (head fake: our core values), then Databricks is the place for you.

As a part of the Database Engine team, there are opportunities to design and implement in many areas that leapfrog existing state-of-the-art systems:

Query compilation & optimization
Distributed query execution and scheduling
Vectorized engine execution
Data security
Resource Management
Transaction coordination
Efficient storage structures (encoding, indexes)
Automatic physical data optimization

What we look for:

PhD in databases or systems
A passion for database systems, storage systems, distributed systems, language design, and/or performance optimization
Motivated by delivering customer value and impact