Garbage collection types

Automatic garbage collection removes the overhead of allocating and deallocating memory. Even if it's not a magic solution to all problems, it's helpful. And because there are different automatic GC types, it's important to know them.

Looking for a better data engineering position and skills?

You have been working as a data engineer but feel stuck? You don't have any new challenges and are still writing the same jobs all over again? You have now different options. You can try to look for a new job, now or later, or learn from the others! "Become a Better Data Engineer" initiative is one of these places where you can find online learning resources where the theory meets the practice. They will help you prepare maybe for the next job, or at least, improve your current skillset without looking for something else.

👉 I'm interested in improving my data engineering skillset

See you there, Bartosz

This article focuses on general idea of automatic garbage collection. It will introduce key ideas of several popular garbage collection. Among them we can find: mark-sweep, copying, generational and mark-compact garbage collection. The article serves as the basis for further articles presenting garbage collections implemented in Java.

Mark-sweep garbage collection

The logic of this garbage collection algorithm is contained in its name - mark-sweep. As we can deduce, it's based on 2 steps: marking and sweeping. The marking step consists on traversing the graph of object stored in the memory. Each time when it encounters a reachable object (ie. object still used in the program), the garbage collector marks it as being "in use".

The sweeping phase is focused more on freeing the memory. All objects which weren't marked as "in use" in the previous step are removed from the memory. It also resets the state of marked objects to make possible their collection in one of further GC passes.

One of the main drawbacks of this algorithm is the necessity to stop program's thread when GC occurs.

Copying garbage collection

Copying garbage collection is based on the principle of divide memory heap on 2 spaces, very often called: new and old. All new objects are placed in old space. When this space is full, copying GC starts to play. First, it copies all reachable objects (still "in use") to the space called new. By doing so, it also compacts both spaces.

The compacting is very important for the interpreter allocating new objects. Thanks to it, it has a very large and contiguous chunks of free memory. Now, putting new objects there is easy - they are placed in one of extremities (end or beginning).

Objects remaining in old space aren't used anymore by the program. Accordingly they can be freely removed from the memory. By doing so, interpreters has once again 2 spaces available: one completely free, another one in use. As you can suppose, after copying objects, the space names change too - previous "old" space becomes "new" space and previous "new" space becomes "old". Naturally, objects copy process is triggered once again when old space fills up.

An advantage of copying garbage collection is the fact that the fragmentation isn't made in separate operation. More, it's made automatically when objects are moved from one space to another. However, it needs more memory to be able to keep two equal spaces.

Generational garbage collector

Copying garbage collection can work better with generational garbage collection. In this type of garbage collection the memory is divided to more fine-grained spaces than in the case of copying garbage collection. This new spaces are identified by object ages. An "age" can be, for example, the number of times when given object survived to garbage collection.

Generational garbage collection divides memory on spaced called generations. Depending on applied configuration, it can for example divide it into 4 different generations, called for example: G1, G2, G3 and G4. When garbage collection starts, it moves object with certain age and still needed by the program from G1 to G2. Remaining objects (not anymore "in use" state) are removed from the generation. This cycle applies for all generations.

Generation garbage collection is based on the supposition that a big majority of created objects are short-living - potentially they won't never move from G1 to G2.

Mark-compact garbage collector

This kind of garbage collection can be thought as the mix of mark-sweep and copying garbage collection. Its first phase is also based on marking still reachable objects. In additionally, new memory addresses are defined for these objects.

The next step is called compacting. It consists on grouping together lived objects by moving them to empty spaces created in the memory because of removal of unreachable objects. As in the case of copying garbage collection, compacted space (either the begin or the end of memory is occupied) accelerates new objects allocation.

We can see that despite the existence of 4 different garbage collection, they can work together, as it's the case of mark-compact algorithm. Some of them, such as copying and generational, work on divided memory. The others are destined to treat a single memory space, with less or more time needed to pause the program and make objects reachability analysis.

If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!