Garbage collection types
This article focuses on general idea of automatic garbage collection. It will introduce key ideas of several popular garbage collection. Among them we can find: mark-sweep, copying, generational and mark-compact garbage collection. The article serves as the basis for further articles presenting garbage collections implemented in Java.
Mark-sweep garbage collection
The logic of this garbage collection algorithm is contained in its name - mark-sweep. As we can deduce, it's based on 2 steps: marking and sweeping. The marking step consists on traversing the graph of object stored in the memory. Each time when it encounters a reachable object (ie. object still used in the program), the garbage collector marks it as being "in use".
The sweeping phase is focused more on freeing the memory. All objects which weren't marked as "in use" in the previous step are removed from the memory. It also resets the state of marked objects to make possible their collection in one of further GC passes.
One of the main drawbacks of this algorithm is the necessity to stop program's thread when GC occurs.
Copying garbage collection
Copying garbage collection is based on the principle of divide memory heap on 2 spaces, very often called: new and old. All new objects are placed in old space. When this space is full, copying GC starts to play. First, it copies all reachable objects (still "in use") to the space called new. By doing so, it also compacts both spaces.
The compacting is very important for the interpreter allocating new objects. Thanks to it, it has a very large and contiguous chunks of free memory. Now, putting new objects there is easy - they are placed in one of extremities (end or beginning).
Objects remaining in old space aren't used anymore by the program. Accordingly they can be freely removed from the memory. By doing so, interpreters has once again 2 spaces available: one completely free, another one in use. As you can suppose, after copying objects, the space names change too - previous "old" space becomes "new" space and previous "new" space becomes "old". Naturally, objects copy process is triggered once again when old space fills up.
An advantage of copying garbage collection is the fact that the fragmentation isn't made in separate operation. More, it's made automatically when objects are moved from one space to another. However, it needs more memory to be able to keep two equal spaces.
Generational garbage collector
Copying garbage collection can work better with generational garbage collection. In this type of garbage collection the memory is divided to more fine-grained spaces than in the case of copying garbage collection. This new spaces are identified by object ages. An "age" can be, for example, the number of times when given object survived to garbage collection.
Generational garbage collection divides memory on spaced called generations. Depending on applied configuration, it can for example divide it into 4 different generations, called for example: G1, G2, G3 and G4. When garbage collection starts, it moves object with certain age and still needed by the program from G1 to G2. Remaining objects (not anymore "in use" state) are removed from the generation. This cycle applies for all generations.
Generation garbage collection is based on the supposition that a big majority of created objects are short-living - potentially they won't never move from G1 to G2.
Mark-compact garbage collector
This kind of garbage collection can be thought as the mix of mark-sweep and copying garbage collection. Its first phase is also based on marking still reachable objects. In additionally, new memory addresses are defined for these objects.
The next step is called compacting. It consists on grouping together lived objects by moving them to empty spaces created in the memory because of removal of unreachable objects. As in the case of copying garbage collection, compacted space (either the begin or the end of memory is occupied) accelerates new objects allocation.
We can see that despite the existence of 4 different garbage collection, they can work together, as it's the case of mark-compact algorithm. Some of them, such as copying and generational, work on divided memory. The others are destined to treat a single memory space, with less or more time needed to pause the program and make objects reachability analysis.
If you liked it, you should read: Tips to discover internals of an Open Source framework internals - Apache Spark use case Let it crash model Amortized and effective constant time