Why does Python's __import__ require fromlist?
In Python, if you want to programmatically import a module, you can do:
module = __import__('module_name')
If you want to import a submodule, you would think it would be a simple matter of:
module = __import__('module_name.submodule')
Of course, this doesn't work; you just get module_name
again. You have to do:
module = __import__('module_name.submodule', fromlist=['blah'])
Why? The actual value of fromlist
don't seem to matter at all, as long as it's non-empty. What is the point of requiring an argument, then ignoring its values?
Most stuff in Python seems to be done for good reason, but for the life of me, I can't come up with any reasonable explanation for this behavior to exist.
In fact, the behaviour of __import__()
is entirely because of the implementation of the import
statement, which calls __import__()
. There's basically five slightly different ways __import__()
can be called by import
(with two main categories):
import pkg
import pkg.mod
from pkg import mod, mod2
from pkg.mod import func, func2
from pkg.mod import submod
In the first and the second case, the import
statement should assign the "left-most" module object to the "left-most" name: pkg
. After import pkg.mod
you can do pkg.mod.func()
because the import
statement introduced the local name pkg
, which is a module object that has a mod
attribute. So, the __import__()
function has to return the "left-most" module object so it can be assigned to pkg
. Those two import statements thus translate into:
pkg = __import__('pkg')
pkg = __import__('pkg.mod')
In the third, fourth and fifth case, the import
statement has to do more work: it has to assign to (potentially) multiple names, which it has to get from the module object. The __import__()
function can only return one object, and there's no real reason to make it retrieve each of those names from the module object (and it would make the implementation a lot more complicated.) So the simple approach would be something like (for the third case):
tmp = __import__('pkg')
mod = tmp.mod
mod2 = tmp.mod2
However, that won't work if pkg
is a package and mod
or mod2
are modules in that package that are not already imported, as they are in the third and fifth case. The __import__()
function needs to know that mod
and mod2
are names that the import
statement will want to have accessible, so that it can see if they are modules and try to import them too. So the call is closer to:
tmp = __import__('pkg', fromlist=['mod', 'mod2'])
mod = tmp.mod
mod2 = tmp.mod2
which causes __import__()
to try and load pkg.mod
and pkg.mod2
as well as pkg
(but if mod
or mod2
don't exist, it's not an error in the __import__()
call; producing an error is left to the import
statement.) But that still isn't the right thing for the fourth and fifth example, because if the call were so:
tmp = __import__('pkg.mod', fromlist=['submod'])
submod = tmp.submod
then tmp
would end up being pkg
, as before, and not the pkg.mod
module you want to get the submod
attribute from. The implementation could have decided to make it so the import
statement does extra work, splitting the package name on .
like the __import__()
function already does and traversing the names, but this would have meant duplicating some of the effort. So, instead, the implementation made __import__()
return the right-most module instead of the left-most one if and only if fromlist is passed and not empty.
(The import pkg as p
and from pkg import mod as m
syntax doesn't change anything about this story except which local names get assigned to -- the __import__()
function sees nothing different when as
is used, it all remains in the import
statement implementation.)
I still feel weird when I read the answer, so I tried the below code samples.
First, try to build below file structure:
tmpdir
|A
|__init__.py
| B.py
| C.py
Now A is a package
, and B
or C
is a module
. So when we try some code like these in ipython:
Second, run the sample code in ipython:
In [2]: kk = __import__('A',fromlist=['B'])
In [3]: dir(kk)
Out[3]:
['B',
'__builtins__',
'__doc__',
'__file__',
'__name__',
'__package__',
'__path__']
It seems like the fromlist works as we expected. But things become wired when we try to do the same things on a module
. Suppose we have a module called C.py and code in it:
handlers = {}
def hello():
print "hello"
test_list = []
So now we try to do the same thing on it.
In [1]: ls
C.py
In [2]: kk = __import__('C')
In [3]: dir(kk)
Out[3]:
['__builtins__',
'__doc__',
'__file__',
'__name__',
'__package__',
'handlers',
'hello',
'test_list']
So when we just want to import the test_list, does it work?
In [1]: kk = __import__('C',fromlist=['test_list'])
In [2]: dir(kk)
Out[2]:
['__builtins__',
'__doc__',
'__file__',
'__name__',
'__package__',
'handlers',
'hello',
'test_list']
As the result shows, when we try to use fromlist on a module
rather than a package
, the fromlist param doesn't help at all because module
has been compiled. Once it is imported, there is no way to ignore the other ones.