Unable to extract OneDrive ZIP file on Ubuntu 18.10
So I downloaded a 45 gig directory from OneDrive which got saved on my Ubuntu VM in the form of a zip file.
Now when I try to extract it via GUI, it gives me an error saying the archive is empty.
When I try to use the unzip
command via terminal, it gives me an error:
warning [Archive.zip]: 43855246100 extra bytes at beginning or within zipfile (attempting to process anyway)
error [Archive.zip]: start of central directory not found; zipfile corrupt.
(please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly)
https://stackoverflow.com/questions/27151176/zip-files-corrupt-over-4-gigabytes-no-warnings-or-errors-did-i-lose-my-data/31084012
From the above link I learned that unzip fails for archives greater than 4 gigs.
So I tried multiple other options, like tar xvf
, jar xf
, 7z x
- For
tar xvf
, I get
tar: This does not look like a tar archive
tar: Skipping to next header
- For
7z x
, the archive gets extracted, but with the following errors:
Headers Error
Unconfirmed start of archive
WARNINGS
There are data after the end of archive
Also, there were supposed to be some .rar files inside the original (downloaded) zip file, which were missing from the extracted location.
- For
jar xf
, I get the error:
Error in JAR file! (not compressed but data desc.)
When I try to run zip -T Archive.zip
, I get
Could not find Archive.z01
Hit c (change path to where this split file is)
q (abort archive - quit)
or ENTER (try reading this split again):
Is it possible OneDrive gave me a corrupted archive?
Solution 1:
I know this question was posted over a year ago, but after running into what looks like the exact same issue I might be able to provide some answers. The most likely cause of this interoperability problem is OneDrive's implementation of the ZIP64 extension, and to be more precise the value of the "total number of disks" field in the "zip64 end of central dir locator". In the OneDrive files, this value is set to 0 (zero), whereas most reader tools expect a value of 1.
For more information here's a detailed write-up I posted on my blog:
https://www.bitsgalore.org/2020/03/11/does-microsoft-onedrive-export-large-ZIP-files-that-are-corrupt
If you're comfortable with a bit of Hex editing, you can provisionally "fix" affected files by changing the first byte of the "total number of disks" field in a hex editor, again see my blog post for details.
UPDATE: partially prompted by my blog post someone has written a Perl script that fixes these files automatically, see this link for more details:
https://unix.stackexchange.com/a/590034
Direct link to the script:
https://github.com/pmqs/Fix-OneDrive-Zip
In addition, this support page from Microsoft says (3rd 'Notes' section from top):
Downloads are subject to the following limits: individual file size limit: 10GB; total zip file size limit: 20GB; total number of files limit: 10,000.
So for a 45 GB directory you might hit these size limits as well.