Time series - general notes

Temporal data is a little bit particular. It can be generated very frequently, as for instance every 500 ms or less. It's then important to store it efficiently and to allow quick and flexible reads. It's also important to know the specificities of time-series as a popular case of temporal data.

New ebook 🔥

Learn 84 ways to solve common data engineering problems with cloud services.

👉 I want my Early Access edition

This post is called "general notes" since it presents the common points of time series from the bird's eye view. The first section explains the general points about time series. The next one focuses on the data characteristics of the time series while the last one tries to explain the technical solutions helping to put them in place.

General information

Time series belongs to the family of the temporal data, i.e. the values somehow related to the time concept (either a point in time or an interval). The main ordering domain of this data category is unsurprisingly the time (e.g. event time).

The time series is the subset of the temporal data. It's represented as a sequence of data points happened over a time interval. The happen-moment can occur either at regular or irregular interval. This regularity has a big influence on the amount of registered data. The regular registering every 1 second will generate much more data than an irregular one saving for instance ATM transactions.

The registering regularity brings the first points describing time series - the granularity. It defines the scale of the stored data and thus determines the level of information we can gather through it. The information contained in time series is quite important. It lets pretty easily to observe the trends (e.g. website real-time audience) and correlate the measurements with potentially impacting events.

Among the operations we can made with the time series we can find a lot of common points with classical RDBMS querying, as:

Data characteristics

The time series data can be described by the following concepts:

To have a more precise context our temperature measurement example can look like in the table below:

The time series data character is also specific. It can be summarized by the following points:

The architectures for time series

In "Comparison of Time Series Databases" Andreas Bader classifies the time series databases according to the criteria emphasizing the technologies behind the databases used to store time series. Among them we can distinguish 3 technical approaches:

The "Comparison of Time Series Databases" gives also some insight on the evaluation of the architecture well suited for time series data:

This post summarized the basic information about the time series. As we could see in the first part, they're a subset of temporal data, i.e. data sorted by the event time. This kind type of data is characterized by regular or irregular generation. The database storing time series must be resilient and easily scalable since the volume of time series can grow very quickly. Moreover, it should support the requirement for long-term storage by supporting data removal and aggregation. As shown in the last section, different approaches exist to deal with that. The first ones are based on 3rd part NoSQL storages as Cassandra or HBase. The other ones use the classical RDBMS while the last category doesn't require any specific storage.