Unable to extract OneDrive ZIP file on Ubuntu 18.10

So I downloaded a 45 gig directory from OneDrive which got saved on my Ubuntu VM in the form of a zip file.

Now when I try to extract it via GUI, it gives me an error saying the archive is empty.

When I try to use the unzip command via terminal, it gives me an error:

warning [Archive.zip]: 43855246100 extra bytes at beginning or within zipfile (attempting to process anyway)

error [Archive.zip]: start of central directory not found; zipfile corrupt.

(please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly)

https://stackoverflow.com/questions/27151176/zip-files-corrupt-over-4-gigabytes-no-warnings-or-errors-did-i-lose-my-data/31084012

From the above link I learned that unzip fails for archives greater than 4 gigs.

So I tried multiple other options, like tar xvf, jar xf, 7z x

  1. For tar xvf, I get

tar: This does not look like a tar archive

tar: Skipping to next header

  1. For 7z x, the archive gets extracted, but with the following errors:

Headers Error

Unconfirmed start of archive

WARNINGS

There are data after the end of archive

Also, there were supposed to be some .rar files inside the original (downloaded) zip file, which were missing from the extracted location.

  1. For jar xf, I get the error:

Error in JAR file! (not compressed but data desc.)

When I try to run zip -T Archive.zip, I get

Could not find Archive.z01

Hit c (change path to where this split file is)

q (abort archive - quit)

or ENTER (try reading this split again):

Is it possible OneDrive gave me a corrupted archive?


Solution 1:

I know this question was posted over a year ago, but after running into what looks like the exact same issue I might be able to provide some answers. The most likely cause of this interoperability problem is OneDrive's implementation of the ZIP64 extension, and to be more precise the value of the "total number of disks" field in the "zip64 end of central dir locator". In the OneDrive files, this value is set to 0 (zero), whereas most reader tools expect a value of 1.

For more information here's a detailed write-up I posted on my blog:

https://www.bitsgalore.org/2020/03/11/does-microsoft-onedrive-export-large-ZIP-files-that-are-corrupt

If you're comfortable with a bit of Hex editing, you can provisionally "fix" affected files by changing the first byte of the "total number of disks" field in a hex editor, again see my blog post for details.

UPDATE: partially prompted by my blog post someone has written a Perl script that fixes these files automatically, see this link for more details:

https://unix.stackexchange.com/a/590034

Direct link to the script:

https://github.com/pmqs/Fix-OneDrive-Zip

In addition, this support page from Microsoft says (3rd 'Notes' section from top):

Downloads are subject to the following limits: individual file size limit: 10GB; total zip file size limit: 20GB; total number of files limit: 10,000.

So for a 45 GB directory you might hit these size limits as well.