Databricks tips

I'm the author of Data Engineering Design Patterns (O'Reilly), a Databricks MVP, and a freelance data engineer specializing in Apache Spark and Databricks. I help teams move from working pipelines to resilient architectures.
I'm currently accepting new projects for Jun 2026. Whether you need a 2-day architectural audit, a hands-on lead for a complex data engineering problem, or a workshop let's discuss your project here.

Databricks Asset Bundles and lack of files synchronization after workspace files removal

You may encounter a files synchronization issue after manually deleting the files from your Databricks Asset Bundle's workspace (e.g. /Workspace/Users/your_user/.bundle/your_project/dev/files). The is...

Continue Reading β†’

How to debug queries on Databricks for your PowerBI dashboard created with the import mode?

The import mode loads all the data to PowerBI which makes debugging more challenging. To facilitate the debug, you can ask for termporarly switching from the import mode to the direct query mode and m...

Continue Reading β†’

How to define a single-node cluster in Databricks Asset Bundles?

The definition for a single-node cluster can look like that: new_cluster: spark_version: 15.4.x-scala2.12 node_type_id: i4i.large autotermination_minutes: 5 runtime_engine: STANDARD...

Continue Reading β†’

How to define permissions for the jobs in Databricks Asset Bundles?

You can define the permissions with the...permissions block, as follows: # source: https://docs.databricks.com/aws/en/dev-tools/bundles/permissions#define-specific-resource-permissions # ... ...

Continue Reading β†’

How to disable triggers on dev environment with Databricks Asset Bundles?

If you use Databricks Asset Bundles (DAB) in development mode (mode: development), the deployed workflows will be prefixed by the environment name and your users, like in the next picture: Howeve...

Continue Reading β†’

How to generate a Databricks Asset Bundle easily?

The easiest way to generate the YAML for the Databricks Asset Bundles is to use the UI. First, you need to create your workflow manually, like below with an if-else task: Next, click on the me...

Continue Reading β†’

How to get columns of one or multiple tables?

To get the columns for one or multiple tables, you can combine SHOW TABLES and DESCRIBE EXTENDED commands. The SHOW TABLEs command lists all tables within a schema while the DESCRIBE EXTENDED command ...

Continue Reading β†’

How to read Kinesis data from a timestamp on Databricks?

You want to start a streaming job on Databricks from a particular point in time. For that you need to use the AT_TIMESTAMP position and define the JSON with the timstamp to process from: spark.read...

Continue Reading β†’

How to retrieve a value for a variable in Databricks Asset Bundles?

If you are looking to load a workspace-scoped variable to your DAB, you can leverage the lookup keyword. Here is an example of a variable resolving webhook notification called slack-notification : ...

Continue Reading β†’

You encounter an 'Invalid cross-device link' error on Databricks?

The error happens when you use Path.rename to move a file between incompatible storage layers, for example from a local file system to a Databricks volume. In that case, Databricks will fail with an e...

Continue Reading β†’