Out of memory when encoding file to base64

Using Base64 from Apache commons

public byte[] encode(File file) throws FileNotFoundException, IOException {
        byte[] encoded;
        try (FileInputStream fin = new FileInputStream(file)) {
            byte fileContent[] = new byte[(int) file.length()];
            fin.read(fileContent);
            encoded = Base64.encodeBase64(fileContent);
        }
        return encoded;   
}


Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
    at org.apache.commons.codec.binary.BaseNCodec.encode(BaseNCodec.java:342)
    at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:657)
    at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:622)
    at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:604)

I'm making small app for mobile device.


You cannot just load the whole file into memory, like here:

byte fileContent[] = new byte[(int) file.length()];
fin.read(fileContent);

Instead load the file chunk by chunk and encode it in parts. Base64 is a simple encoding, it is enough to load 3 bytes and encode them at a time (this will produce 4 bytes after encoding). For performance reasons consider loading multiples of 3 bytes, e.g. 3000 bytes - should be just fine. Also consider buffering input file.

An example:

byte fileContent[] = new byte[3000];
try (FileInputStream fin = new FileInputStream(file)) {
    while(fin.read(fileContent) >= 0) {
         Base64.encodeBase64(fileContent);
    }
}

Note that you cannot simply append results of Base64.encodeBase64() to encoded bbyte array. Actually, it is not loading the file but encoding it to Base64 causing the out-of-memory problem. This is understandable because Base64 version is bigger (and you already have a file occupying a lot of memory).

Consider changing your method to:

public void encode(File file, OutputStream base64OutputStream)

and sending Base64-encoded data directly to the base64OutputStream rather than returning it.

UPDATE: Thanks to @StephenC I developed much easier version:

public void encode(File file, OutputStream base64OutputStream) {
  InputStream is = new FileInputStream(file);
  OutputStream out = new Base64OutputStream(base64OutputStream)
  IOUtils.copy(is, out);
  is.close();
  out.close();
}

It uses Base64OutputStream that translates input to Base64 on-the-fly and IOUtils class from Apache Commons IO.

Note: you must close the FileInputStream and Base64OutputStream explicitly to print = if required but buffering is handled by IOUtils.copy().


Either the file is too big, or your heap is too small, or you've got a memory leak.

  • If this only happens with really big files, put something into your code to check the file size and reject files that are unreasonably big.

  • If this happens with small files, increase your heap size by using the -Xmx command line option when you launch the JVM. (If this is in a web container or some other framework, check the documentation on how to do it.)

  • If the file recurs, especially with small files, the chances are that you've got a memory leak.


The other point that should be made is that your current approach entails holding two complete copies of the file in memory. You should be able to reduce the memory usage, though you'll typically need a stream-based Base64 encoder to do this. (It depends on which flavor of the base64 encoding you are using ...)

This page describes a stream-based Base64 encoder / decoder library, and includes lnks to some alternatives.