How to organize garbage collection on the web and how Orinoco GC works

Memory clearing is a mandatory process for any application, including those written in JS. If we neglect this process, the memory will be filled with objects (mostly “dead” ones), and at some point, there will simply be no place for new data. In this article, we will analyze the basic principles of clearing memory in JS, as well as consider several garbage collection algorithms and determine which one is more efficient.

What garbage is

Garbage is any object that cannot be reached by reference from the “root”. Garbage includes all “dead objects” that have lost contact with the root ones.

Moreover, “dead” objects can often be linked with each other and form chains of considerable dimensions. Still, it doesn’t make them “live,” and that’s why a reference counting approach to garbage collection doesn’t work. But which ones work then? Let’s start at the beginning.

Approaches to garbage collection

Let’s consider several approaches, starting with the simplest and fastest and ending with the most effective but slowest ones.

Mark and Sweep

This is the easiest garbage search and deletion algorithm. It works only in a situation where the memory contains garbage; if it is only filled with live objects, Mark and Sweep doesn’t do anything.

The main drawback of Mark and Sweep is that it fragments memory. As a result, it can clear a lot of space, but still, a new large object can’t be stored in RAM.

Mark-Sweep-Compact

The Mark-Sweep-Compact algorithm has that very necessary defragmentation function. After clearing, it shifts all the remaining objects and creates a single chunk of free memory space.

In terms of the result, Mark-Sweep-Compact is optimal. However, the process of shifting objects is rather resource-consuming; besides, Mark-Sweep-Compact needs to go through the memory two to three times, which takes a lot of extra time.

SemiSpace

SemiSpace (copying collector) speeds up the process of clearing and defragmenting somewhat. SemiSpace finds live objects, copies them to space allocated in advance, and deletes the rest.

This algorithm is simpler and faster than Mark-Sweep-Compact but requires additional space for copying live objects. Additionally, the process of copying is also resource-consuming.

The weak generational hypothesis

According to the generational hypothesis, young objects are more likely to die than older ones. A newly created object is more likely to become garbage. That’s why it is rational for a garbage collector to check such objects more often.

Let’s consider the generational approach using the Java garbage collection algorithm. It works as follows:

  1. All memory is divided into young and old generations.
  2. The young generation is divided into the following slots: Eden, Survivor Space 1 and 2 (S0 and S1).
  3. All new objects go into Eden.
  4. During each check, the algorithm transfers live objects from Eden and one of the S-slots to the other S-slot and then clears the checked slots fully.
  5. The algorithm checks S-slots one after another: if during the previous check Eden and S0 slots were checked, and live objects were copied to S1, during the next check, Eden and S1 will be checked, and live objects will be copied to S0.
  6. Objects that can live long enough are moving to the old generation. During this time, they manage to go through many checks and will continue to remain alive with a high degree of probability.

A similar principle is applied when running the Orinoco GC garbage collector in the V8 JS engine.

Orinoco GC

Orinoco GC segments memory and moves objects depending on their age too, but this algorithm has its nuances:

  1. The memory is divided into young and old, but the young memory includes only two slots: Nursery and Intermediate.
  2. A memory check is carried out in several threads at once. When moving a live object from Nursery to Intermediate, the stream leaves a forward pointer. This is done so that another thread that comes to the same object, but along a different chain of references, does not try to move it again.

The Orinoco GC algorithm works as follows:

  1. All new objects go to Nursery, and then, after checking the Scavenger, the surviving ones move to Intermediate; the rest are deleted.
  2. After compacting the memory and passing the next Scavenger check, live objects go to the old memory.
  3. The old memory is checked less often; the check is carried out by the main GC.

Two garbage collectors work in Orinoco: a secondary one (Scavenger), which checks the young memory, and the main one, which checks the entire array. The Compact phase is optional, and it is used at the prerogative of the main collector. Since moving objects remains a resource-consuming process, in order to minimize such actions, Orinoco uses free lists. These are lists of free space in memory, which allow defining a suitable free space for a new object. In cases where the memory is too fragmented and there is no suitable place for the object, the algorithm enters the Compact phase.

JS developers don’t have access to the GC; it is an implementation detail. Although JS cannot call the collector directly, V8 provides access to the environment, which the engine is embedded into. GC can set tasks that the environment performs in its free time.