Feature stores - introduction

Feature stores are a more and more common topic in the data landscape and they have been in my backlog for several months already. Finally, I ended up writing the blog post and I really appreciated the learning experience! Even though it's a blog post from a data engineer perspective, so maybe without a deep data science deep dive.

4-day workshop · In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants · production-ready templates
See the full curriculum → €7,000 flat fee · cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

Store for the features, but not only!

At first glance, understanding the need for a feature store may not be obvious. After all, the data (features) for training and prediction is always prepared beforehand. That's maybe true but becomes problematic at scale. Features, so these "little" attributes used by the models during training and inference steps drive the accuracy of the result. Things become complicated when they have to be shared among teams or used in offline and online predictions. The question for these scenarios is how to guarantee the use of the same set of features? And that's where the feature store shines. It provides a central and a single source of truth for the features. The definition is quite reductive because a feature store is much more than that! After all, to keep a single source of truth, you could create a "data-science" bucket and put all your feature data inside.

That's why the single source of truth is only one of the following set of features:

Put another way, the feature store in a picture could look like that:

And here we are! But it's not the end because next week you'll see a feature store in action. See you then!

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems contact@waitingforcode.com đź“©