The basic Big O measures (O(1), O(n), O(log n)) are understandable quite easily. However they're not the single ones - especially because of Scala collections. Two variations of them, a little bit less unequivocal, exist - amortized and effective constant time.
New ebook 🔥
Learn 84 ways to solve common data engineering problems with cloud services.
This post is divided in 2 sections. The first part describes the amortized constant time. The second part presents another variance, the effective constant time.
Amortized constant time
The amortized constant time complexity comes from amortized analysis. This kind of analysis is used when we want to evaluate the total complexity of a sequence of operations. The amortized sequence complexity represents the average cost of given operation in the analyzed sequence. This evaluation over the sequence leads to the situation when the complexity of several costly operations can be deported to the less costly operations, leading to the nearly constant execution time. The general formula used to compute amortized cost per operation looks like: T(n)/n where T(n) is the total cost over a worst case sequence of n operations.
It's important to emphasize that the "constant" doesn't mean necessarily O(1). The "constant" means here the not changing complexity, even for the worst cases. In Big O notation the amortized constant time is suffixed with a sign of "+". For instance, an amortized constant time of O(log n) is written as O(log n)+.
A great example illustrating the amortized constant time complexity is adding new element to an array list (or dynamic array). This data structure stores the elements in a dynamically resizible array. It also has the concept of capacity defining the size of this dynamic array. Every time when adding new item makes this capacity reached, the elements accumulated up to this moment are copied to a new bigger array (e.g. twice big as the first array) and only after this copy the new item is really added.
As you can see, the add operation takes a constant time (O(1)) almost all the time. Only sometimes, more exactly when the backed array must be resized, it takes O(n). Thus the worst case for n+1 operations is O(n). Then the amortized complexity is O(n)/n+1 = O(1). Please notice that the result was computed with the aggregate method which is one of 3 available methods to get the amortized complexity (other 2 are: accounting and potential).
Effective constant time
The effective constant time complexity is similar to the amortized one but it's founded on assumptions about given data structure rather than on operation properties (e.g. array resizing). Among these assumptions we can distinguish for instance the distribution of hash keys or maximum length of a vector.
A good example showing the effective constant time and the maximum length assumption is the vector implementation in Scala. The vector uses internally an ordered tree (trie) with the branching factor of 32 (= each node can have up to 32 children). It gives the access time complexity of O(log32(n)) (or general complexity of O(logb(n))). In this case the branching factor is considered as an assumption because bigger it is, smaller access time complexity is. For instance for b=32 and n=1000000, the complexity will be about 3.986 and for b=64 it will be 3.322.
Scala's vector also explains clearly why it's considered as effectively constant time. The differences between accessing 1000000th and 2000000th elements are so small (3.986 vs 4.186) that they can be ignored. So even if technically accessing 1000000th and 200000th elements doesn't give a constant time, it's considered as effective constant time access.
Amortized and effectively constant time complexities enlarge the standard Big O measures. Both are based on some external specificities. The amortized measure includes both costly and less costly operations but takes into account their average. So simply speaking, it doesn't matter about costly operations if they're rare enough to be ignored. The effective constant time is rather based on data structure assumptions that are considered not significant enough to have any impact on the complexity.