How to obfuscate Python code effectively?

I am looking for how to hide my Python source code.

print "Hello World!" 

How can I encode this example so that it isn't human-readable? I've been told to use base64 but I'm not sure how.


Solution 1:

This is only a limited, first-level obfuscation solution, but it is built-in: Python has a compiler to byte-code:

python -OO -m py_compile <your program.py>

produces a .pyo file that contains byte-code, and where docstrings are removed, etc. You can rename the .pyo file with a .py extension, and python <your program.py> runs like your program but does not contain your source code.

PS: the "limited" level of obfuscation that you get is such that one can recover the code (with some of the variable names, but without comments and docstrings). See the first comment, for how to do it. However, in some cases, this level of obfuscation might be deemed sufficient.

PPS: If your program imports modules obfuscated like this, then you need to rename them with a .pyc suffix instead (I'm not sure this won't break one day), or you can work with the .pyo and run them with python -O ….pyo (the imports should work). This will allow Python to find your modules (otherwise, Python looks for .py modules).

Solution 2:

so that it isn't human-readable?

i mean all the file is encoded !! when you open it you can't understand anything .. ! that what i want

As maximum, you can compile your sources into bytecode and then distribute only bytecode. But even this is reversible. Bytecode can be decompiled into semi-readable sources.

Base64 is trivial to decode for anyone, so it cannot serve as actual protection and will 'hide' sources only from complete PC illiterates. Moreover, if you plan to actually run that code by any means, you would have to include decoder right into the script (or another script in your distribution, which would needed to be run by legitimate user), and that would immediately give away your encoding/encryption.

Obfuscation techniques usually involve comments/docs stripping, name mangling, trash code insertion, and so on, so even if you decompile bytecode, you get not very readable sources. But they will be Python sources nevertheless and Python is not good at becoming unreadable mess.

If you absolutely need to protect some functionality, I'd suggest going with compiled languages, like C or C++, compiling and distributing .so/.dll, and then using Python bindings to protected code.