Garbage Collectors (HotSpot VM)

There are a variety of options regarding which garbage collector to use in Java. For the Hotspot VM, the set of garbage collectors which can be used includes:

  • Serial GC (default)
  • Parallel GC (throughput)
  • CMS Concurrent Mark and Sweep GC (low latency)
  • G1 GC (Next Gen Replacement for CMS)

Determining which garbage collector is being used in your application is not necessarily straight forward; it depends on various settings and your machine hardware. You can use the following command line parameter to your java program invocation to ensure the selected GC is printed to console though:

java -XX:+PrintCommandLineFlags –version

Terminology

Eden –Where new objects are allocated.
Survivor Spaces – Where objects are moved and aged if they survive a minor GC.
Young Generation – Eden + Survivor Spaces
Old Generation – Where aged objects are promoted if they survive a threshold number of GCs.
Stop-The-World – An event that requires all application threads to pause during its execution.

The Serial GC

The serial garbage collector is the most basic one. Its minor and major collection stages are both stop-the-world events. After each cycle the Serial GC compacts the old generation’s heap to the bottom to prevent fragmentation. This allows new allocations (on object promotion to the old generation) to be very optimized; they just grab the top-of-the-heap pointer and move it forward the size of the new object, since they know everything forward of that pointer is free space. This GC does not parallelize its operation. It is good for client-side applications that can tolerate substantial pauses or in situations where processors aren’t highly available (to reduce thread contention).

The Parallel GC

The parallel garbage collector works much like the serial garbage collector. The only significant difference is that both minor and full garbage collections take place in parallel to increase throughput. It is suitable for server machines with lots of resources, and is primary used in batch processing or data crunching applications that care more about data throughput than pause/response time.

CMS – Concurrent Mark and Sweep GC

The CMS garbage collector has been the preferred GC for applications with rapid response time requirements for some time (though the new G1 GC is coming to replace it soon). It has greatly shortened stop-the-world phases by doing a substantial part of its marking work while the user is still using the system. Its implementation means the major GCs are much faster, but the minor ones are a little slower. Its young generation works the same way as the previous GCs, but its old generation works by:

  • Initial Marking – Stop the World – identifying objects immediately reachable from outside the old generation.
  • Concurrent Marking – Marks all live objects reachable from the initial set.
  • Pre-Cleaning – Revisits objects modified during concurrent marking, reducing work for remark.
  • Remark – Stop the World – Visits all objects modified during the Pre-Cleaning phase, ensuring none are missed.
  • Concurrent Sweep – All objects are now marked, this sweeps the heap and de-allocates garbage objects.

The CMS does not compact its old generation heap during the concurrent sweep; instead it maintains “free lists” to track free memory. This reduces the pause time, but makes each promotion to the old generation take longer. Fragmentation can occur, causing new allocations to fail; in that case, a special stop-the-world compacting phase can be triggered for this GC, resulting in a longer than standard pause.

The CMS requires a bigger heap as new objects are still being created during concurrent marking. Object counts only decrease during the sweep phase. It should also be noted that while the CMS is guaranteed to mark all live objects, it may miss some garbage objects if they become garbage during the concurrent phases. These objects will then “float” to the next cycle before being reclaimed. The CMS GC is widely used in web services and similar applications.

G1 GC – Replacement for CMS

According to the Oracle site documentation at http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html, the G1 garbage collector is a server style GC for high powered machines. It is designed to meet pause time goals with a high probability while achieving higher throughput than the CMS. It is a compacting GC which avoids CMS’s fragmentation issues.

The G1 collector takes a different stance than the previous hotspot collectors; it does not partition to young, old, and permanent generation. Instead, equal sized heap regions are maintained in contiguous ranges. Region sets are given the same roles (Eden, Survivor, Old), but no fixed size is assigned, thus increasing flexibility. When a GC is executed, a concurrent global marking phase is executed, and that information is used to locate the regions which are mostly empty. These regions are targeted first; having their contents evacuated (copied) in parallel to other regions in the heap as part of a compacting process. Time to evacuate regions is monitored to maintain a reasonable calculation of the pause times which will be encountered while collecting future regions. This is used to meet the user-provided pause time expectation whenever possible.

References

This information is summarized/paraphrased from the following sources:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s