Make and Makefiles have been around for a while to facilitate tasks definitions, even in Python. But Python has an alternative that we are going to discover in this blog post.
What would it take for you to trust your Databricks pipelines in production?
A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
Konieczny
This Python-native tool you can use to run tasks is Poe The Poet (Poe). In a nutshell, it's a task runner that centralizes - exactly like a Makefile does - everything you can run for your project. However, the features list goes well beyond a simple task execution wrapper. Besides, Poe supports:
- Different task types, from commands, shells, to Python functions and expressions.
- Clear, dictionary-like declaration mode with a native support for external arguments.
- Nice help interface that lists all available commands.
- Tasks chaining where one task can be composed of many others.
- Directed Acyclic Graph (DAG) to compose more advanced execution workflows than simple chains of tasks.
- Declaring tasks in plain Python so that you can create a Python package and share it across different projects.
- Virtual environment detection if used with uv or Poetry; you don't need to precede your calls with uv run or poetry run.
Need an example? Here is a build command preceded by lint and unit tests execution steps:
[tool.poe.tasks.lint] help = "Checks code style with ruff." shell = """ uv run ruff check --fix . uv run ruff format --check . """ [tool.poe.tasks.test] help = "Runs unit tests" cmd = "uv run pytest" [tool.poe.tasks._build_internal] help = "Creates a wheel of the project." shell = """ rm -rf ./dist uv build --sdist --wheel """ [tool.poe.tasks.build] help = "Builds the project after executing linter and unit tests." sequence = ["lint", "test", "_build_internal"]
📌 Underscore-prefixed tasks
You certainly noticed the _ in the _build_internal task. In Poe, the underscore-prefixed tasks are considered private, i.e. they won't display at poe --help but will remain available for usage by other tasks.
Databricks and Poe The Poet
Now the question is, how to combine Poe and a Databricks Asset Bundles (DAB) project? The nice thing with DAB is the flexibility for the build command. Well, certainly a lot of you have been using this simple build command which is completely fine if it does the job:
artifacts:
default:
type: whl
build: uv build --wheel
path: .
But what if your build task is more complex? The first strategy, which is recommended in the documentation by the way, is to have a separate build step and assume your wheel exists before deploying the bundle TODO find the doc link for that. It works, indeed but there is a glitch. Whenever you want to deploy the bundle, you need to run the build step too. Having the artifacts defined in the bundle simplifies developer experience since you can simply deploy the bundle.
Otherwise, you can leverage task chaining and execute a Poe task to prepare your bundle environment. Here is an example of the pyproject.toml with Poe dependency and the tasks composing the build process:
# ...
[dependency-groups]
dev = [
"poethepoet==0.40.0"
]
[tool.poe.tasks.download_file]
help ="Downloads a text file from the workspace."
cmd = "databricks workspace export --format SOURCE --profile personal_free_edition --file ./dist/downloaded_file.txt /Workspace/Shared/file_to_download.txt"
[tool.poe.tasks._build_internal]
shell = """
rm -rf ./dist
uv build --sdist --wheel
"""
[tool.poe.tasks.build]
help = "Builds the wheel"
sequence = [
"_build_internal", "download_file"
]
And here is the bundle file configured to handle the Poe task as the build step:
artifacts:
python_artifact:
type: whl
build: poe build
You can see how it works in the next video (TODO: publish in Databricks playlist):
Poe helps keeping complex tasks simple, including build tasks requiring some external dependencies such as other wheels stored on Databricks (cf. Semantic versioning with a Databricks volume-based package). You can also consider it as an API standardizing build process in Python. If you are familiar with them in one project, it should be relatively easy to understand advanced build processes in other projects using Poe.
Data Engineering Design Patterns
Looking for a book that defines and solves most common data engineering problems? I wrote
one on that topic! You can read it online
on the O'Reilly platform,
or get a print copy on Amazon.
I also help solve your data engineering problems contact@waitingforcode.com đź“©
Read also about Poe The Poet as handy extension for Databricks Asset Bundles here:
Related blog posts:
- Dynamic File Pruning and MERGE on Databricks
- Repairing and backfilling on Lakeflow Jobs
- Variables in Databricks Asset Bundles
