PowerShell - System.OutOfMemoryException

Solution 1:

Reading large files into memory simply to split them, while easy, will never be the most efficient method, and you will run into memory limits somewhere.

This is even more apparent here because Get-Content works on strings — and, as you mention in the comments, you are dealing with binary files.

.NET (and, therefore, PowerShell) stores all strings in memory as UTF-16 code units. This means each code unit takes up 2 bytes in memory.

It happens that a single .NET string can only store (2^31 - 1) code units, since the length of a string is tracked by an Int32 (even on 64-bit versions). Multiply that by 2, and a single .NET string can (theoretically) use about 4 GB.

Get-Content will store every line in its own string. If you have a single line with > 2 billion characters... that's likely why you're getting that error despite having "enough" memory.

Alternatively, it could be because there's a limit of 2 GB for any given object unless larger sizes are explicitly enabled (are they for PowerShell?). Your 4 GB OOM could also be because there's two copies/buffers kept around as Get-Content tries to find a line break to split on.

The solution, of course, is to work with bytes and not characters (strings).


If you want to avoid third-party programs, the best way to do this is to drop in to the .NET methods. This is easiest done with a full language like C# (which can be embedded into PowerShell), but it is possible to do purely with PS.

The idea is you want to work with byte arrays, not text streams. There are two ways to do this:

  • Use [System.IO.File]::ReadAllBytes and [System.IO.File]::WriteAllBytes. This is pretty easy, and better than strings (no conversion, no 2x memory usage), but will still run into issues with very large files - say you wanted to process 100 GB files?

  • Use file streams and read/write in smaller chunks. This requires a fair bit more maths since you need to keep track of your position, but you avoid reading the entire file into memory in one go. This will likely be the fastest approach: allocating very large objects will probably outweigh the overhead of multiple reads.

So you read chunks of a reasonable size (these days, the minimum is 4kB at a time) and copy them to the output file one chunk at a time, rather than reading the entire file into memory and splitting it. You may wish to tune the size upwards, e.g. 8kB, 16kB, 32kB, etc., if you need to squeeze every last drop of performance out - but you'd need to benchmark to find the optimum size, as some larger sizes are slower.

An example script follows. For reusability it should be converted into a cmdlet or at least a PS function, but this is enough to serve as a working example.

$fileName = "foo"
$splitSize = 100MB

# need to sync .NET CurrentDirectory with PowerShell CurrentDirectory
# https://stackoverflow.com/questions/18862716/current-directory-from-a-dll-invoked-from-powershell-wrong
[Environment]::CurrentDirectory = Get-Location
# 4k is a fairly typical and 'safe' chunk size
# partial chunks are handled below
$bytes = New-Object byte[] 4096

$inFile = [System.IO.File]::OpenRead($fileName)

# track which output file we're up to
$fileCount = 0

# better to use functions but a flag is easier in a simple script
$finished = $false

while (!$finished) {
    $fileCount++
    $bytesToRead = $splitSize

    # Just like File::OpenWrite except CreateNew instead to prevent overwriting existing files
    $outFile = New-Object System.IO.FileStream "${fileName}_$fileCount",CreateNew,Write,None

    while ($bytesToRead) {
        # read up to 4k at a time, but no more than the remaining bytes in this split
        $bytesRead = $inFile.Read($bytes, 0, [Math]::Min($bytes.Length, $bytesToRead))

        # 0 bytes read means we've reached the end of the input file
        if (!$bytesRead) {
            $finished = $true
            break
        }

        $bytesToRead -= $bytesRead

        $outFile.Write($bytes, 0, $bytesRead)
    }

    # dispose closes the stream and releases locks
    $outFile.Dispose()
}

$inFile.Dispose()