Poe The Poet as handy extension for Databricks Asset Bundles

Versions: Databricks Runtime 17.3 LTS

Make and Makefiles have been around for a while to facilitate tasks definitions, even in Python. But Python has an alternative that we are going to discover in this blog post.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

This Python-native tool you can use to run tasks is Poe The Poet (Poe). In a nutshell, it's a task runner that centralizes - exactly like a Makefile does - everything you can run for your project. However, the features list goes well beyond a simple task execution wrapper. Besides, Poe supports:

Need an example? Here is a build command preceded by lint and unit tests execution steps:

[tool.poe.tasks.lint]
help = "Checks code style with ruff."
shell = """
uv run ruff check --fix .
uv run ruff format --check .
"""
[tool.poe.tasks.test]
help = "Runs unit tests"
cmd = "uv run pytest"

[tool.poe.tasks._build_internal]
help = "Creates a wheel of the project."
shell = """
rm -rf ./dist
uv build --sdist --wheel
"""


[tool.poe.tasks.build]
help = "Builds the project after executing linter and unit tests."
sequence = ["lint", "test", "_build_internal"]

📌 Underscore-prefixed tasks

You certainly noticed the _ in the _build_internal task. In Poe, the underscore-prefixed tasks are considered private, i.e. they won't display at poe --help but will remain available for usage by other tasks.

Databricks and Poe The Poet

Now the question is, how to combine Poe and a Databricks Asset Bundles (DAB) project? The nice thing with DAB is the flexibility for the build command. Well, certainly a lot of you have been using this simple build command which is completely fine if it does the job:

artifacts:
  default:
    type: whl
    build: uv build --wheel
    path: .

But what if your build task is more complex? The first strategy, which is recommended in the documentation by the way, is to have a separate build step and assume your wheel exists before deploying the bundle TODO find the doc link for that. It works, indeed but there is a glitch. Whenever you want to deploy the bundle, you need to run the build step too. Having the artifacts defined in the bundle simplifies developer experience since you can simply deploy the bundle.

Otherwise, you can leverage task chaining and execute a Poe task to prepare your bundle environment. Here is an example of the pyproject.toml with Poe dependency and the tasks composing the build process:

# ...
[dependency-groups]
dev = [
    "poethepoet==0.40.0"
]

[tool.poe.tasks.download_file]
help ="Downloads a text file from the workspace."
cmd = "databricks workspace export --format SOURCE --profile personal_free_edition --file ./dist/downloaded_file.txt /Workspace/Shared/file_to_download.txt"

[tool.poe.tasks._build_internal]
shell = """
rm -rf ./dist
uv build --sdist --wheel
"""

[tool.poe.tasks.build]
help = "Builds the wheel"
sequence = [
    "_build_internal", "download_file"
]

And here is the bundle file configured to handle the Poe task as the build step:

artifacts:
  python_artifact:
    type: whl
    build: poe build

You can see how it works in the next video (TODO: publish in Databricks playlist):

Poe helps keeping complex tasks simple, including build tasks requiring some external dependencies such as other wheels stored on Databricks (cf. Semantic versioning with a Databricks volume-based package). You can also consider it as an API standardizing build process in Python. If you are familiar with them in one project, it should be relatively easy to understand advanced build processes in other projects using Poe.

Consulting

With nearly 17 years of experience, including 9 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
đź”— past projects