Python (and Python C API): __new__ versus __init__
The question I'm about to ask seems to be a duplicate of Python's use of __new__ and __init__?, but regardless, it's still unclear to me exactly what the practical difference between __new__
and __init__
is.
Before you rush to tell me that __new__
is for creating objects and __init__
is for initializing objects, let me be clear: I get that. In fact, that distinction is quite natural to me, since I have experience in C++ where we have placement new, which similarly separates object allocation from initialization.
The Python C API tutorial explains it like this:
The new member is responsible for creating (as opposed to initializing) objects of the type. It is exposed in Python as the
__new__()
method. ... One reason to implement a new method is to assure the initial values of instance variables.
So, yeah - I get what __new__
does, but despite this, I still don't understand why it's useful in Python. The example given says that __new__
might be useful if you want to "assure the initial values of instance variables". Well, isn't that exactly what __init__
will do?
In the C API tutorial, an example is shown where a new Type (called a "Noddy") is created, and the Type's __new__
function is defined. The Noddy type contains a string member called first
, and this string member is initialized to an empty string like so:
static PyObject * Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
.....
self->first = PyString_FromString("");
if (self->first == NULL)
{
Py_DECREF(self);
return NULL;
}
.....
}
Note that without the __new__
method defined here, we'd have to use PyType_GenericNew
, which simply initializes all of the instance variable members to NULL. So the only benefit of the __new__
method is that the instance variable will start out as an empty string, as opposed to NULL. But why is this ever useful, since if we cared about making sure our instance variables are initialized to some default value, we could have just done that in the __init__
method?
The difference mainly arises with mutable vs immutable types.
__new__
accepts a type as the first argument, and (usually) returns a new instance of that type. Thus it is suitable for use with both mutable and immutable types.
__init__
accepts an instance as the first argument and modifies the attributes of that instance. This is inappropriate for an immutable type, as it would allow them to be modified after creation by calling obj.__init__(*args)
.
Compare the behaviour of tuple
and list
:
>>> x = (1, 2)
>>> x
(1, 2)
>>> x.__init__([3, 4])
>>> x # tuple.__init__ does nothing
(1, 2)
>>> y = [1, 2]
>>> y
[1, 2]
>>> y.__init__([3, 4])
>>> y # list.__init__ reinitialises the object
[3, 4]
As to why they're separate (aside from simple historical reasons): __new__
methods require a bunch of boilerplate to get right (the initial object creation, and then remembering to return the object at the end). __init__
methods, by contrast, are dead simple, since you just set whatever attributes you need to set.
Aside from __init__
methods being easier to write, and the mutable vs immutable distinction noted above, the separation can also be exploited to make calling the parent class __init__
in subclasses optional by setting up any absolutely required instance invariants in __new__
. This is generally a dubious practice though - it's usually clearer to just call the parent class __init__
methods as necessary.
There are probably other uses for __new__
but there's one really obvious one: You can't subclass an immutable type without using __new__
. So for example, say you wanted to create a subclass of tuple that can contain only integral values between 0 and size
.
class ModularTuple(tuple):
def __new__(cls, tup, size=100):
tup = (int(x) % size for x in tup)
return super(ModularTuple, cls).__new__(cls, tup)
You simply can't do this with __init__
-- if you tried to modify self
in __init__
, the interpreter would complain that you're trying to modify an immutable object.
__new__()
can return objects of types other than the class it's bound to. __init__()
only initializes an existing instance of the class.
>>> class C(object):
... def __new__(cls):
... return 5
...
>>> c = C()
>>> print type(c)
<type 'int'>
>>> print c
5