Is it possible to have an actual memory leak in Python because of your code?
I don't have a code example, but I'm curious whether it's possible to write Python code that results in essentially a memory leak.
Solution 1:
It is possible, yes.
It depends on what kind of memory leak you are talking about. Within pure python code, it's not possible to "forget to free" memory such as in C, but it is possible to leave a reference hanging somewhere. Some examples of such:
an unhandled traceback object that is keeping an entire stack frame alive, even though the function is no longer running
while game.running():
try:
key_press = handle_input()
except SomeException:
etype, evalue, tb = sys.exc_info()
# Do something with tb like inspecting or printing the traceback
In this silly example of a game loop maybe, we assigned 'tb' to a local. We had good intentions, but this tb contains frame information about the stack of whatever was happening in our handle_input all the way down to what this called. Presuming your game continues, this 'tb' is kept alive even in your next call to handle_input, and maybe forever. The docs for exc_info now talk about this potential circular reference issue and recommend simply not assigning tb
if you don't absolutely need it. If you need to get a traceback consider e.g. traceback.format_exc
storing values in a class or global scope instead of instance scope, and not realizing it.
This one can happen in insidious ways, but often happens when you define mutable types in your class scope.
class Money(object):
name = ''
symbols = [] # This is the dangerous line here
def set_name(self, name):
self.name = name
def add_symbol(self, symbol):
self.symbols.append(symbol)
In the above example, say you did
m = Money()
m.set_name('Dollar')
m.add_symbol('$')
You'll probably find this particular bug quickly, but in this case you put a mutable value at class scope and even though you correctly access it at instance scope, it's actually "falling through" to the class object's __dict__
.
This used in certain contexts like holding objects could potentially cause things that cause your application's heap to grow forever, and would cause issues in say, a production web application that didn't restart its processes occasionally.
Cyclic references in classes which also have a __del__
method.
Ironically, the existence of a __del__
makes it impossible for the cyclic garbage collector to clean an instance up. Say you had something where you wanted to do a destructor for finalization purposes:
class ClientConnection(...):
def __del__(self):
if self.socket is not None:
self.socket.close()
self.socket = None
Now this works fine on its own, and you may be led to believe it's being a good steward of OS resources to ensure the socket is 'disposed' of.
However, if ClientConnection kept a reference to say, User
and User kept a reference to the connection, you might be tempted to say that on cleanup, let's have user de-reference the connection. This is actually the flaw, however: the cyclic GC doesn't know the correct order of operations and cannot clean it up.
The solution to this is to ensure you do cleanup on say, disconnect events by calling some sort of close, but name that method something other than __del__
.
poorly implemented C extensions, or not properly using C libraries as they are supposed to be.
In Python, you trust in the garbage collector to throw away things you aren't using. But if you use a C extension that wraps a C library, the majority of the time you are responsible for making sure you explicitly close or de-allocate resources. Mostly this is documented, but a python programmer who is used to not having to do this explicit de-allocation might throw away the handle (like returning from a function or whatever) to that library without knowing that resources are being held.
Scopes which contain closures which contain a whole lot more than you could've anticipated
class User:
def set_profile(self, profile):
def on_completed(result):
if result.success:
self.profile = profile
self._db.execute(
change={'profile': profile},
on_complete=on_completed
)
In this contrived example, we appear to be using some sort of 'async' call that will call us back at on_completed
when the DB call is done (the implementation could've been promises, it ends up with the same outcome).
What you may not realize is that the on_completed
closure binds a reference to self
in order to execute the self.profile
assignment. Now, perhaps the DB client keeps track of active queries and pointers to the closures to call when they're done (since it's async) and say it crashes for whatever reason. If the DB client doesn't correctly cleanup callbacks etc, in this case, the DB client now has a reference to on_completed which has a reference to User which keeps a _db
- you've now created a circular reference that may never get collected.
(Even without a circular reference, the fact that closures bind locals and even instances sometimes may cause values you thought were collected to be living for a long time, which could include sockets, clients, large buffers, and entire trees of things)
Default parameters which are mutable types
def foo(a=[]):
a.append(time.time())
return a
This is a contrived example, but one could be led to believe that the default value of a
being an empty list means append to it, when it is in fact a reference to the same list. This again might cause unbounded growth without knowing that you did that.
Solution 2:
The classic definition of a memory leak is memory that was used once, and now is not, but has not been reclaimed. That nearly impossible with pure Python code. But as Antoine points out, you can easily have the effect of consuming all your memory inadvertently by allowing data structures to grow without bound, even if you don't need to keep all of the data around.
With C extensions, of course, you are back in unmanaged territory, and anything is possible.
Solution 3:
Of course you can. The typical example of a memory leak is if you build a cache that you never flush manually and that has no automatic eviction policy.