I'm the author of Data Engineering Design Patterns (O'Reilly),
a Databricks MVP, and
a freelance data engineer specializing in Apache Spark and Databricks.
I help teams move from working pipelines to resilient architectures.
I'm currently accepting new projects for Jun 2026. Whether you need a 2-day architectural audit, a hands-on lead for a
complex data engineering problem, or a workshop
let's discuss your project here.
You may encounter a files synchronization issue after manually deleting the files from your Databricks Asset Bundle's workspace (e.g. /Workspace/Users/your_user/.bundle/your_project/dev/files). The is...
The import mode loads all the data to PowerBI which makes debugging more challenging. To facilitate the debug, you can ask for termporarly switching from the import mode to the direct query mode and m...
The definition for a single-node cluster can look like that: new_cluster: spark_version: 15.4.x-scala2.12 node_type_id: i4i.large autotermination_minutes: 5 runtime_engine: STANDARD...
You can define the permissions with the...permissions block, as follows: # source: https://docs.databricks.com/aws/en/dev-tools/bundles/permissions#define-specific-resource-permissions # ... ...
If you use Databricks Asset Bundles (DAB) in development mode (mode: development), the deployed workflows will be prefixed by the environment name and your users, like in the next picture: Howeve...
The easiest way to generate the YAML for the Databricks Asset Bundles is to use the UI. First, you need to create your workflow manually, like below with an if-else task: Next, click on the me...
To get the columns for one or multiple tables, you can combine SHOW TABLES and DESCRIBE EXTENDED commands. The SHOW TABLEs command lists all tables within a schema while the DESCRIBE EXTENDED command ...
You want to start a streaming job on Databricks from a particular point in time. For that you need to use the AT_TIMESTAMP position and define the JSON with the timstamp to process from: spark.read...
If you are looking to load a workspace-scoped variable to your DAB, you can leverage the lookup keyword. Here is an example of a variable resolving webhook notification called slack-notification : ...
The error happens when you use Path.rename to move a file between incompatible storage layers, for example from a local file system to a Databricks volume. In that case, Databricks will fail with an e...