How does the Garbage-First Garbage Collector work?

Solution 1:

The collector splits the heap up into fixed-size regions and tracks the live data in those regions. It keeps a set of pointers — the "remembered set" — into and out of the region. When a GC is deemed necessary, it collects the regions with less live data first (hence, "garbage first"). Often, this can mean collecting an entire region in one step: if the number of pointers into a region is zero, then it doesn't need to do a mark or sweep of that region.

For each region, it tracks various metrics that describe how long it will take to collect them. You can give it a soft real-time constraint about pause times, and it then tries to collect as much garbage as it can in that constrained time.

There is JavaOne talk about G1 and few articles on the topic:

  • http://developers.sun.com/learning/javaoneonline/j1sessn.jsp
  • http://www.fasterj.com/articles/G1.shtml
  • http://www.drdobbs.com/java/219401061
  • http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All

Solution 2:

The G1 is nicely explained also in this new JavaOne 2012 session : G1 Garbage Collector Performance Tuning [youtube], [PDF].

They start with introduction of CMS and G1, their comparison, and then the G1 analysis and tuning is explained.

G1 characteristics

  • Fixed Size Regions - heap split into regions (1Mb - 32MB, ~2000, determined by the VM).
  • Eden, survivor and OldGen represented as a logical set of regions.
  • Live objects are evacuated from one region to another

A typical G1 heap may look like:

A typical G1 heap may look like:

Here is a summary of each G1 phase:

1. Young Collection

1.1 Young Phase - Minor GC

  • Evacuation - Stop-The-World Parallel minor GC where live objects are evacuated from Young Generation either into Survivor regions (tenuring) or OldGen regions (promotion).
  • Accounting - size of eden/survivor space for the next young GC is determined based on stats from each region and based on the pause-time target set by the app. G1 estimates how much time it will need for the next YoungGC.
  • Resizing - G1 can now easily reduce/resize eden/survivor regions.

1.2 Young / Initial Mark

  • GC young initial-mark is an initial marking phase for the OldGen collection that is executed in parallel with the YoungGC collection. The initial mark is a parallel concurrent marking process.

2. Old Gen Collection

2.1 Initial Mark - see 1.2.

2.2 GC remark

  • one stop-the-world pause, concurrenlty marking live objects
  • accounting - for each region, during remark, G1 is tracking liviness of the region(how many objects are live in each region), and references into the region (Remembered Set) - this tells G1 how long is takes to do a collection on this region.
  • Empty Regions reclaimed

2.3. GC pause (mixed)

  • choose regions with low liviness and collect some of them. Hence, we are collecting the "garbage first".
  • the actual collection of these regions is performed at the same time as the next Young GC, so there is no separate pause for the collection of the OldGen. Therefore, the GC pause (mixed) is a mixed collection of YoungGen and a portion of old Gen.
  • At the end of GC pause (mixed) there can be some garbage left in old gen regions, this will be collected later depending of future liviness, pause time target and number of unused regions.

3. Full GC

Note that G1 is intended to avoid Full GC as much as possible. As of Java 7u40, FullGC pauses in G1 are not optimized and are implemented as a single threaded operation. When using G1, try to avoid Full GC - if you see any FullGC pauses, your GC setup probably requires some tuning.

Resources

  • G1: One Garbage Collector To Rule Them All
  • JavaOne 2012 session : G1 Garbage Collector Performance Tuning [youtube], [PDF]
  • Garbage Collection in Java (4) - Garbage First
  • Garbage-First Garbage Collection - the original paper.

Solution 3:

I found Oracle's page on this to be extremely helpful at explaining the concepts in an accessible manner without being too lengthy.