Python 32-bit memory limits on 64bit windows

I'm getting a memory issue I can't seem to understand.

I'm on a windows 7 64 bit machine with 8GB of memory and running a 32bit python program.

The programs reads a 5,118 zipped numpy files (npz). Windows reports that the files take up 1.98 GB on disk

Each npz file contains two pieces of data: 'arr_0' is of type np.float32 and 'arr_1' is of type np.uint8

The python script reads each file appends their data into two lists and then closes the file.

Around file 4284/5118 the program throws a MemoryException

However, the task manager says that the memory usage of python.exe *32 when the error occurs is 1,854,848K ~= 1.8GB. Much less than my 8 GB limit, or the supposed 4GB limit of a 32bit program.

In the program I catch the memory error and it reports: Each list has length 4285. The first list contains a total of 1,928,588,480 float32's ~= 229.9 MB of data. The second list contains 12,342,966,272 uint8's ~= 1,471.3MB of data.

So, everything seems to be checking out. Except for the part where I get a memory error. I absolutely have more memory, and the file which it crashes on is ~800KB, so its not failing on reading a huge file.

Also, the file isn't corrupted. I can read it just fine, if I don't use up all that memory beforehand.

To make things more confusing, all of this seems to work fine on my Linux machine (although it does have 16GB of memory as opposed to 8GB on my Windows machine), but still, it doesn't seem to be the machine's RAM that is causing this issue.

Why is Python throwing a memory error, when I expect that it should be able to allocate another 2GB of data?


Solution 1:

I don't know why you think your process should be able to access 4GB. According to Memory Limits for Windows Releases at MSDN, on 64-bit Windows 7, a default 32-bit process gets 2GB.* Which is exactly where it's running out.

So, is there a way around this?

Well, you could make a custom build of 32-bit Python that uses the IMAGE_FILE_LARGE_ADDRESS_AWARE flag, and rebuild numpy and all of your other extension modules. I can't promise that all of the relevant code really is safe to run with the large-address-aware flag; there's a good chance it is, but unless someone's already done it and tested it, "a good chance" is the best anyone is likely to know.

Or, more obviously, just use 64-bit Python instead.


The amount of physical RAM is completely irrelevant. You seem to think that you have an "8GB limit" with 8GB of RAM, but that's not how it works. Your system takes all of your RAM plus whatever swap space it needs and divides it up between apps; an app may be able to get 20GB of virtual memory without getting a memory error even on an 8GB machine. And meanwhile, a 32-bit app has no way of accessing more than 4GB, and the OS will use up some of that address space (half of it by default, on Windows), so you can only get 2GB even on an 8GB machine that's not running anything else. (Not that it's possible to ever be "not running anything else" on a modern OS, but you know what I mean.)


So, why does this work on your linux box?

Because your linux box is configured to give 32-bit processes 3.5GB of virtual address space, or 3.99GB, or… Well, I can't tell you the exact number, but every distro I've seen for many years has been configured for at least 3.25GB.


* Also note that you don't even really get that full 2GB for your data; your program. Most of what the OS and its drivers make accessible to your code sits in the other half, but some bits sit in your half, along with every DLL you load and any space they need, and various other things. It doesn't add up to too much, but it's not zero.