Local import statements in Python

I think putting the import statement as close to the fragment that uses it helps readability by making its dependencies more clear. Will Python cache this? Should I care? Is this a bad idea?

def Process():
    import StringIO
    file_handle=StringIO.StringIO('hello world')
    #do more stuff

for i in xrange(10): Process()

A little more justification: it's for methods which use arcane bits of the library, but when I refactor the method into another file, I don't realize I missed the external dependency until I get a runtime error.


Solution 1:

The other answers evince a mild confusion as to how import really works.

This statement:

import foo

is roughly equivalent to this statement:

foo = __import__('foo', globals(), locals(), [], -1)

That is, it creates a variable in the current scope with the same name as the requested module, and assigns it the result of calling __import__() with that module name and a boatload of default arguments.

The __import__() function handles conceptually converts a string ('foo') into a module object. Modules are cached in sys.modules, and that's the first place __import__() looks--if sys.modules has an entry for 'foo', that's what __import__('foo') will return, whatever it is. It really doesn't care about the type. You can see this in action yourself; try running the following code:

import sys
sys.modules['boop'] = (1, 2, 3)
import boop
print boop

Leaving aside stylistic concerns for the moment, having an import statement inside a function works how you'd want. If the module has never been imported before, it gets imported and cached in sys.modules. It then assigns the module to the local variable with that name. It does not not not modify any module-level state. It does possibly modify some global state (adding a new entry to sys.modules).

That said, I almost never use import inside a function. If importing the module creates a noticeable slowdown in your program—like it performs a long computation in its static initialization, or it's simply a massive module—and your program rarely actually needs the module for anything, it's perfectly fine to have the import only inside the functions in which it's used. (If this was distasteful, Guido would jump in his time machine and change Python to prevent us from doing it.) But as a rule, I and the general Python community put all our import statements at the top of the module in module scope.

Solution 2:

Please see PEP 8:

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

Please note that this is purely a stylistic choice as Python will treat all import statements the same regardless of where they are declared in the source file. Still I would recommend that you follow common practice as this will make your code more readable to others.

Solution 3:

Style aside, it is true that an imported module will only be imported once (unless reload is called on said module). However, each call to import Foo will have implicitly check to see if that module is already loaded (by checking sys.modules).

Consider also the "disassembly" of two otherwise equal functions where one tries to import a module and the other doesn't:

>>> def Foo():
...     import random
...     return random.randint(1,100)
... 
>>> dis.dis(Foo)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (random)
              9 STORE_FAST               0 (random)

  3          12 LOAD_FAST                0 (random)
             15 LOAD_ATTR                1 (randint)
             18 LOAD_CONST               2 (1)
             21 LOAD_CONST               3 (100)
             24 CALL_FUNCTION            2
             27 RETURN_VALUE        
>>> def Bar():
...     return random.randint(1,100)
... 
>>> dis.dis(Bar)
  2           0 LOAD_GLOBAL              0 (random)
              3 LOAD_ATTR                1 (randint)
              6 LOAD_CONST               1 (1)
              9 LOAD_CONST               2 (100)
             12 CALL_FUNCTION            2
             15 RETURN_VALUE        

I'm not sure how much more the bytecode gets translated for the virtual machine, but if this was an important inner loop to your program, you'd certainly want to put some weight on the Bar approach over the Foo approach.

A quick and dirty timeit test does show a modest speed improvement when using Bar:

$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Foo()"
200000 loops, best of 3: 10.3 usec per loop
$ python -m timeit -s "from a import Foo,Bar" -n 200000 "Bar()"
200000 loops, best of 3: 6.45 usec per loop