When are objects garbage collected in python?

When are objects garbage collected in python? When is the memory released and does the collection impact performance? Can one opt out or tune the gc algorithm and if so how?


Solution 1:

When are objects garbage collected in python?

There is a lot of detail in the source code for CPython: http://svn.python.org/view/python/trunk/Modules/gcmodule.c?revision=81029&view=markup

Any time a reference count drops to zero, the object is immediately removed.

293 /* Python's cyclic gc should never see an incoming refcount

294 * of 0: if something decref'ed to 0, it should have been

295 * deallocated immediately at that time.

A full collection is triggered when the number of new objects is greater than 25% of the number of existing objects.

87 In addition to the various configurable thresholds, we only trigger a

88 full collection if the ratio

89 long_lived_pending / long_lived_total

90 is above a given value (hardwired to 25%).

When is the memory released?

I was only able to fish out this information.

781 /* Clear all free lists

782 * All free lists are cleared during the collection of the highest generation.

783 * Allocated items in the free list may keep a pymalloc arena occupied.

784 * Clearing the free lists may give back memory to the OS earlier.

785 */

According to this, Python may be keeping your object in a free list for recycling even if you drop its refcount to zero. I am unable to explicitly find when the free call is made to give memory back to the operating system, but I imagine that this is done whenever a collection is made and the object is not being kept in a free list.

Does the collection impact performance?

Any non-trivial garbage collector I have heard of requires both CPU and memory to operate. Therefore, yes, there is always an impact on performance. You'll have to experiment and get to know your garbage collector.

Programs that require real time responsiveness I have run into issues with, since garbage collectors don't grant me control over when they run or for how long they do. Some peculiar cases can cause excessive memory use as well, an example being Python's knack for keeping free lists.

Solution 2:

Here is an excerpt from the language reference

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).

EDIT: About postponing garbage collection .... the gc module allows you to interact with the garbage collector, and disable it if you want to and change collection frequency etc. But I have not used it myself. Also, cycles that contain any objects with __del__ methods are not collected.

Solution 3:

To expand on the previous answers with some more numbers and actionable information:

You can use gc.set_threshold(threshold0[, threshold1[, threshold2]]) to tune when automatic garbage collection kicks in:

The GC classifies objects into three generations depending on how many collection sweeps they have survived. New objects are placed in the youngest generation (generation 0). If an object survives a collection it is moved into the next older generation. Since generation 2 is the oldest generation, objects in that generation remain there after a collection. In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts. Initially only generation 0 is examined. If generation 0 has been examined more than threshold1 times since generation 1 has been examined, then generation 1 is examined as well. With the third generation, things are a bit more complicated, see Collecting the oldest generation for more information.

While I could not find the default thresholds in the documentation, looking through the implementation, the default values for the thresholds seem to be (CPython 3.9.1) :

  • threshold0: 700
  • threshold1: 10
  • threshold2: 10

I.e. by default, automatic garbage collection should set in once the number of allocations minus the number of deallocations exceeds 700.