How can I create the minimum size executable with pyinstaller?

I am on Windows 10, I have anaconda installed but I want to create an executable independently in a new, clean minimal environment using python 3.5. So I did some tests:

TEST1: I created a python script test1.py in the folder testenv with only:

print('Hello World')

Then I created the environment, installed pyinstaller and created the executable

D:\testenv> python -m venv venv_test
...
D:\testenv\venv_test\Scripts>activate.bat
...
(venv_test) D:\testenv>pip install pyinstaller
(venv_test) D:\testenv>pyinstaller --clean -F test1.py

And it creates my test1.exe of about 6 Mb

TEST 2: I modified test1.py as follows:

import pandas as pd
print('Hello World')  

I installed pandas in the environment and created the new executable:

(venv_test) D:\testenv>pip install pandas
(venv_test) D:\testenv>pyinstaller --clean -F test1.py

Ant it creates my test1.exe which is now of 230 Mb!!!

if I run the command

(venv_test) D:\testenv>python -V
Python 3.5.2 :: Anaconda custom (64-bit)

when I am running pyinstaller I get some messages I do not understand, for example:

INFO: site: retargeting to fake-dir 'c:\\users\\username\\appdata\\local\\continuum\\anaconda3\\lib\\site-packages\\PyInstaller\\fake-modules'

Also I am getting messages about matplotlib and other modules that have nothing to do with my code, for example:

INFO:   Matplotlib backend "pdf": added
INFO:   Matplotlib backend "pgf": added
INFO:   Matplotlib backend "ps": added
INFO:   Matplotlib backend "svg": added

I know there are some related questions: Reducing size of pyinstaller exe, size of executable using pyinstaller and numpy but I could not solve the problem and I am afraid I am doing something wrong with respect to anaconda.

So my questions are: what am I doing wrong? can I reduce the size of my executable?


Solution 1:

I accepted the answer above but I post here what I did step by step for complete beginners like me who easily get lost.

Before I begin I post my complete test1.py example script with all the modules I actually need. My apologies if it is a bit more complex than the original question but maybe this can help someone.

test1.py looks like this:

import matplotlib 
matplotlib.use('Agg') 
import matplotlib.pyplot as plt
import matplotlib.image as image
import numpy as np
import os.path
import pandas as pd
import re   

from matplotlib.ticker import AutoMinorLocator 
from netCDF4 import Dataset
from time import time
from scipy.spatial import distance
from simpledbf import Dbf5
from sys import argv

print('Hello World')

I added matplotlib.use('Agg') (as my actual code is creating figures) Generating a PNG with matplotlib when DISPLAY is undefined

1) Install a new version of python independently from anaconda.

downloaded python from: https://www.python.org/downloads/ installed selecting 'add python to path' and deselecting install launcher for all users (I don't have admin rights) check that I am using the same version from CMD, just writing python I get: Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.

2) Create and activate the environment, from CMD

D:\> mkdir py36envtest
...
D:\py36envtest>python -m venv venv_py36
...
D:\py36envtest\venv_py36\Scripts>activate.bat

3) Install in the environment all the modules needed in the script

Making sure they are compatible to the python version with the command: (from Matplotlib not recognized as a module when importing in Python)

(venv_py36) D:\py36envtest> python -m pip install nameofmodule

NB: in my case I also had to add the option --proxy https://00.000.000.00:0000

for the example I used development version of py installer:

(venv_py36) D:\py36envtest> python -m pip install https://github.com/pyinstaller/pyinstaller/archive/develop.tar.gz

and the modules: pandas, matplolib, simpledbf, scipy, netCDF4. At the end my environment looks like this.

(venv_py36) D:\py36envtest> pip freeze
altgraph==0.15
cycler==0.10.0
future==0.16.0
macholib==1.9
matplotlib==2.1.2
netCDF4==1.3.1
numpy==1.14.0
pandas==0.22.0
pefile==2017.11.5
PyInstaller==3.4.dev0+5f9190544
pyparsing==2.2.0
pypiwin32==220
python-dateutil==2.6.1
pytz==2017.3
scipy==1.0.0
simpledbf==0.2.6
six==1.11.0
style==1.1.0
update==0.0.1

4) Create/modify the .spec file (when you run pyinstaller it creates a .spec file, you can rename).

Initially I got a lot of ImportError: DLL load failed (especially for scipy) and missing module error which I solved thanks to these posts:
What is the recommended way to persist (pickle) custom sklearn pipelines?
and the comment to this answer: Pyinstaller with scipy.signal ImportError: DLL load failed

My inputtest1.spec finally looks like this:

# -*- mode: python -*-
options = [ ('v', None, 'OPTION')]
block_cipher = None


a = Analysis(['test1.py'],
             pathex=['D:\\py36envtest', 'D:\\py36envtest\\venv_py36\\Lib\\site-packages\\scipy\\extra-dll' ],
             binaries=[],
             datas=[],
             hiddenimports=['scipy._lib.messagestream',
                            'pandas._libs.tslibs.timedeltas'],
             hookspath=[],
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher)
pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)
exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,
          name='test1',
          debug=False,
          strip=False,
          upx=True,
          runtime_tmpdir=None,
          console=True )

5) Finally make the executable with the command

(venv_py36) D:\py36envtest>pyinstaller -F --clean inputtest1.spec

my test1.exe is 47.6 Mb, the .exe of the same script created from an anaconda virtual environment is 229 Mb.

I am happy (and if there are more suggestions they are welcome)

Solution 2:

The problem is that you should not be using a virtual environment and especially not anaconda. Please download default python 32 bit and use only necessary modules. Then follow the steps provided in the links, this should definitely fix it.

Although you created a virtual env, are you sure your spec file is not linking to old Anaconda entries?

If all this fails, then submit a bug as this is very strange.