Since the early days of data lakes, datasets persisted in object stores have not had primary and foreign key constraints enforced. Databricks is no exception; however, the platform supports unenforced PRIMARY KEY and FOREIGN KEY constraints, which the query optimizer uses to improve performance.
Data Engineering Design Patterns
Looking for a book that defines and solves most common data engineering problems? I wrote
one on that topic! You can read it online
on the O'Reilly platform,
or get a print copy on Amazon.
I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩
Your tables in Unity Catalog won't enforce the PRIMARY KEY and FOREIGN KEY constraints. In other words, you can define repeated primary keys or reference non-existing records. They are useful for data lineage and for queries optimization because Photon simplifies queries when it can leverage the defined constraints.
The magic keyword used to enable this optimization is RELY. By using this attribute, you notify the Databricks query engine that it can leverage the constraint for query planning. For example, a PRIMARY KEY constraint defines the uniqueness of a column; if you perform a DISTINCT operation on that key, Photon can optimize the query into a simple SELECT statement. This optimization is critical as it eliminates the shuffle operations that are often prohibitively costly for large datasets. Nothing better to see this in action.
Let's create a test table first with some data:
CREATE TABLE sales ( id INT, amount DECIMAL(5, 2) ); ALTER TABLE sales ADD PRIMARY KEY (sales_pk) RELY; INSERT INTO sales VALUES (1, 50.00), (2, 40.00);
If you want to select unique sales ids, you can run then:
SELECT DISTINCT id FROM sales;
If you analyze now the execution plan, you will see a simple SELECT statement:
Remember, the key is not enforced
The results look amazing but remember, Databricks doesn't enforce the constraints! Consequently, if you insert duplicated keys, Photon will still consider the RELY clause and transform the SELECT DISTINCT into a simple SELECT.
Let's see what happens if we add duplicated sales:
INSERT INTO sales VALUES (1, 50.00), (2, 40.00);
Our supposition is confirmed. After running the query, we get now duplicated rows:
The query plan didn't change because it still doesn't contain the DISTINCT operator. Let's change this and remove the RELY from the constraint:
ALTER TABLE sales DROP PRIMARY KEY; ALTER TABLE sales ADD PRIMARY KEY (id) NORELY;
Now the SELECT DISTINCT remains unchanged and the physical execution plan contains a shuffle-based deduplication:
When to use it?
As you can see, the keys are useful but dangerous if you don't control the quality of your data. That's why you shouldn't use them blindly, just to be closer to the before-lakehouse world where the PRIMARY and FOREIGN KEYs were enforced by the database.
Instead, you should think in terms of constraints-on-processing so apply the RELY clause only when you can - as a data writer - guarantee the correctness. Among valid situations you will find:
- A full dataset loading with table overwrite, if the dataset doesn't contain duplicates or doesn't reference non-existing foreign keys.
- A dataset processing with data quality guards preventing writing duplicated primary keys, or against referencing non-existing foreign keys.
To sum up, the RELY clause is definitively a great addition to the well governed datasets. But at the same time, it can be a debugging nightmare if you don't control the writing process and let the messy data in. That said, even without the RELY you are already in trouble!
Consulting
With nearly 17 years of experience, including 9 as data engineer, I offer expert consulting to design and optimize scalable data solutions.
As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and
drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!
👉 contact@waitingforcode.com
đź”— past projects

