In the recent posts about Gnocchi we could often meet the concept of archive policy. However, as one of the main points in this system, it merits its own explanation.
Data Engineering Design Patterns
Looking for a book that defines and solves most common data engineering problems? I'm currently writing
one on that topic and the first chapters are already available in π
Early Release on the O'Reilly platform
I also help solve your data engineering problems π contact@waitingforcode.com π©
As you can correctly deduce, this post will be about archive policy in Gnocchi. Its first section presents all theoretical aspects of it. The second one shows some internal details through learning tests.
Archive policy defined
In the previous posts we could learn a little about archive policy. It was presented as a specification for the computation of the aggregations. It defines what kind of aggregation is computed, for which granularity and for how many points. More concretely we can describe an archive policy by the following properties:
- name - helps to uniquely identify given archive policy. Gnocchi doesn't limit the number of policies so a right name should help to organize a complex system better.
- aggregation methods - no surprise, as the name suggests, it's a list of all aggregation methods associated to the policy. We can distinguish among them: mean, sum, last, max, min, std, median, first, count and also X-percentile (X included in the range [1, 99]). Each of them has a corresponding rate-based version. This kind of aggregation computes the rate of change regarding to the points from the previous timespan, just before applying the aggregation. For instance, let's take 2-seconds timespan for rate sum aggregate, where the first one is composed of values: [10, 30]. The second one is composed of [100, 110]. Now the aggregated value for the first window will be 20 (30 - 10) and for the second one 80 (110 - 80).
- back window - defines how old measures we can process. For instance, if it's set to 2 hours, we are be able to process the data from now - 2 hours to now. It's always relative to the largest granularity among the definitions attached to the given archive policy, e.g. for 2 granularities, one of 1 hour and another of 10 minutes, and the last processed point registered for 10:20, the acceptable past time will be computed in hours and will be equal to 8:00 (10:00 - 2 hours) and not 8:20 (10:20 - 2 hours).
- definitions - are used together with the aggregations to define what and for how frequency must be computed. Each definition is described by the following properties:
- granularity - defines how the points will be grouped. For instance, if the granularity is of 30 seconds, it will group the points at :30 and :00 seconds.
- number of points - tells how many aggregated points will be retained.
- timespan - specifies how long the aggregated will be retained.
It's important to emphasize the consistency character of the above 3 properties. If all of 3 are defined, then the timespan must be equal to the granularity multiplied by the number of points.
Another important point is that the properties can be deduced and so:- granularity = timespan/number of points
- number of points = timespan/granularity
- timespan = granularity * number of points
The number of points property is also important for another reason. We're able to know an estimated size of each point that in the worst case scenario is 8.04 byte and at best 0.05 bytes. By multiplying that by the number of points we can easily and early estimate the storage costs.
Gnocchi comes with 4 default policies: low, medium, high and bool. They go from the highest granularity (1 second) to the lowest (5 minuts). Custom policies can be created though with /archive_policy endpoint used in POST HTTP method with an appropriate JSON payload describing the added policy.
The archive policy is stored in index storage and is associated to one or more metrics. Naturally it's involved in the data processing step generating the aggregated time series (see the post about Carbonara storage format). Thus, the archive policy provides the important context defining what data and how will be generated.
It's possible to modify already existent archive policy. However only the changes on the definitions part are possible. And not completely because we can't add new definitions and modify the granularity of already existent. When we try to change the aggregation methods or back window parameters, we'll receive an error message talking about invalid input: "Invalid input: extra keys not allowed @ data['aggregation_methods']" or "Invalid input: extra keys not allowed @ data['back_window']". Moreover, the changes don't apply on already existent data.
Archive policy internals
Let's see now some of the internal details of the archive policy. It's represented by gnocchi.archive_policy.ArchivePolicy class exposing all of 4 properties described in the previous section. The definitions in their turn are represented as the instances of gnocchi.archive_policy.ArchivePolicyItem and similarly, they provide the public access for the points presented previously:
def should_create_default_policy_with_different_definitions(self): granularity_30_sec = numpy.timedelta64(30, 's') granularity_1_sec = numpy.timedelta64(1, 's') timespan_1_day = numpy.timedelta64(1, 'D') timespan_31_days = numpy.timedelta64(31, 'D') long_term_definition = archive_policy.ArchivePolicyItem(granularity_30_sec, timespan=timespan_31_days) short_term_definition = archive_policy.ArchivePolicyItem(granularity_1_sec, timespan=timespan_1_day) two_term_policy = archive_policy.ArchivePolicy('2-term policy', 0, [short_term_definition, long_term_definition], ["sum", "count"]) self.assertEqual('2-term policy', two_term_policy.name) self.assertEqual(0, two_term_policy.back_window) self.assertEqual('30 seconds', str(two_term_policy.max_block_size)) self.assertIn(aggregation.Aggregation("count", granularity_1_sec, numpy.timedelta64(86400, 's')), two_term_policy.aggregations) self.assertIn(aggregation.Aggregation("count", granularity_30_sec, numpy.timedelta64(2678400, 's')), two_term_policy.aggregations) self.assertIn(aggregation.Aggregation("sum", granularity_1_sec, numpy.timedelta64(86400, 's')), two_term_policy.aggregations) self.assertIn(aggregation.Aggregation("sum", granularity_30_sec, numpy.timedelta64(2678400, 's')), two_term_policy.aggregations)
Another interesting point to see in action is the rate-based aggregations that computes the rate based on point values rather than the rate between normal aggregation result:
# create an archive policy curl -d '{ "aggregation_methods": ["sum", "rate:sum"], "back_window": 0, "definition": [ { "granularity": "2s", "timespan": "7 day" } ], "name": "rate_rate_sum_policy_2s" }' -H "Content-Type: application/json" -X POST http://admin:admin@localhost:8041/v1/archive_policy # Now the metric using it curl -d '{ "archive_policy_name": "rate_rate_sum_policy_2s", "name": "rate_rate_sum_metric_2s" }' -H "Content-Type: application/json" -X POST http://admin:admin@localhost:8041/v1/metric # note somewhere the id: 77ed6901-5db3-4334-a8d4-4d8156170cb9 # Add some testing measures curl -d '[ {"timestamp": "2018-04-27T11:00:00", "value": 1.0}, {"timestamp": "2018-04-27T11:00:01", "value": 2.0}, {"timestamp": "2018-04-27T11:00:02", "value": 3.0}, {"timestamp": "2018-04-27T11:00:03", "value": 4.0}, {"timestamp": "2018-04-27T11:00:04", "value": 5.0}, {"timestamp": "2018-04-27T11:00:05", "value": 7.0}, {"timestamp": "2018-04-27T11:00:06", "value": 8.0}, {"timestamp": "2018-04-27T11:00:07", "value": 11.0}, {"timestamp": "2018-04-27T11:00:08", "value": 12.0}, {"timestamp": "2018-04-27T11:00:09", "value": 16.0} ]' -H "Content-Type: application/json" -X POST http://admin:admin@localhost:8041/v1/metric/77ed6901-5db3-4334-a8d4-4d8156170cb9/measures # and execute the rate sum and sum queries curl http://admin:admin@localhost:8041/v1/metric/77ed6901-5db3-4334-a8d4-4d8156170cb9/measures?aggregation=rate:sum curl http://admin:admin@localhost:8041/v1/metric/77ed6901-5db3-4334-a8d4-4d8156170cb9/measures?aggregation=sum
The sum query returns expected values:
[ // 1 + 2 ["2018-04-27T11:00:00+00:00", 2.0, 3.0], // 3 + 4 ["2018-04-27T11:00:02+00:00", 2.0, 7.0], // 5 + 7 ["2018-04-27T11:00:04+00:00", 2.0, 12.0], // 8 + 11 ["2018-04-27T11:00:06+00:00", 2.0, 19.0], // 12 + 16 ["2018-04-27T11:00:08+00:00", 2.0, 28.0] ]
But rate:sum computes the differences between the last points:
[ // 2 - 1, first window ["2018-04-27T11:00:00+00:00", 2.0, 1.0], // 4 - 2 ["2018-04-27T11:00:02+00:00", 2.0, 2.0], // 7 - 2 ["2018-04-27T11:00:04+00:00", 2.0, 3.0], // 11 - 7 ["2018-04-27T11:00:06+00:00", 2.0, 4.0], // 16 - 11 ["2018-04-27T11:00:08+00:00", 2.0, 5.0] ]
The archive policy is a core concept in Gnocchi. It's linked to the metric definition and thus automatically impacts the time series storage. It defines not only what aggregates are precomputed but also tells how many of them and how long are stored. The archive policy is characterized by 4 different properties: name, aggregation methods, back window and definitions. They were detailed in the first section of the post. The second section presented some tests showing how the archive policy's attributes can be involved in the data processing in Gnocchi.