I'm the author of Data Engineering Design Patterns (O'Reilly),
a Databricks MVP, and
a freelance data engineer specializing in Apache Spark and Databricks.
I help teams move from working pipelines to resilient architectures.
I'm currently accepting new projects for Jun 2026. Whether you need a 2-day architectural audit, a hands-on lead for a
complex data engineering problem, or a workshop
let's discuss your project here.
While you try to install a Python dependency, you encounter this error: ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: /home/β¦ One of the reasons c...
If for whatever reason you need a function returning a dictionary from a defaultdict, you can simply convert it that way: from collections import defaultdict default_int_dict = defaultdict(int) ...
The goal is to transform a dictionary into a list of tuples where the key will be the first part of the tuple and the value the second part. To do this in Python we can use comprehensions: data = {...
One of native methods supporting caching in Python is @lru_cache decorator. It caches method execution by parameters and it's the key of the solution proposed in this tip. Since the cache is parame...
Poetry installs the dependencies from the lock file. If it's not synchronized with the pyproject.toml, the poetry install can use out-of-dated dependencies. To ensure both files are synchronized, you ...
For a long time to access a random element in an array I used something like: import random letters = ['a', 'b', 'c', 'd'] random_letter = letters[random.randint(0, len(letters)-1)] assert r...
Python has an interesting method called slice. This function generates a slice which is a set of indices between start and stop every step elements. s: letters = ["a", "b", "c", "d", "e", "f", "g"]...
There are multiple ways to achieve that but my favorite is the following: dict_1 = {'a': 10, 'b': 20, 'c': 30} dict_2 = {'c': 40, 'd': 50, 'e': 60} merged_dict = {dict_key: dict_1.get(dict_key,...
You can read a multiline file in different ways. Some of them are less verbose than the others and in this tip I will focus on one-line solution materializing the who le input at once. But let's begi...
Let's suppose that we have a list like input_list = [1, 2, 3, 4, 5, 6] and we want to retrieve from it 3 variables, a=1, b=[2, 3, 4, 5] and c=6. A primitive solution to that problem could be: a, b,...
defaultdict helps to reduce boilerplate code in Python. Using it in normal circumstances is very easy: from collections import defaultdict default_int_dict = defaultdict(int) default_int_dict...
It's a common trap for Python newcomers, the mutable default arguments. Let's take an example of the function accumulating numbers in a list: def add_numbers(new_number, numbers = []): numbers...
Scala has a convenient method to get a default value if one parameter is missing. Python doesn't provide this feature as a native function. It can be implemented with a ternary expression, though. ...
The simplest way to understand the difference is to consider them in different categories. The @staticmethod decorator is in that context just a function wrapped by a class body. It could have lived o...