Why is extracting this tgz throwing an error on my Mac but not on Linux?

I'm experiencing a rather odd problem, and I can't figure out what's going on. I have a tgz file, scip-3.2.0.tgz, that is throwing an error when I attempt to unpack it. The error is only occurring on OS X (I'm on 10.10.4). I can extract the file without error on a Linux box running CentOS 6.6. The error occurs when both using the command line tar command and when using the archive utility. I emailed the SCIP mailing list, and I have the same SHA-1 hash as another user (e085a4a3591eddf945dcb365d97d2512c267e374), so there wasn't a download error. They aren't sure what's going on.

Here's the error I get when I try to unpack using the archive utility:

archive utility error

In case the image ever gets broken, the text in the image says this:

Unable to expand "scip-3.2.0.tgz" into "Desktop".
(Error 1 - Operation not permitted.)

And when I try to unpack via the command line, this is the output I get. It's the last line (tar: Error exit delayed from previous errors.) that concerns me. I don't see what's causing it. The archive appears to extract without problem, but I don't trust it with that error being thrown.

Does anyone know what's causing this?

[edit]
Looking a little closer at the output, line 1108 contains the error:

x scip-3.2.0/applications/Coloring/Makefile: Can't create 'scip-3.2.0/applications/Coloring/Makefile'

Solution 1:

This should help identify what is going on in Johnny's answer, as well as answer the question of why this works on Linux but not Mac.

The problem lies in the fact that Mac OS X uses bsdtar, whereas most Linux systems use gnutar.

You can install gnutar on a Mac with Homebrew, using brew install gnu-tar, which will symlink gnutar into /usr/local/bin as gtar.

If you install gnutar, then you can reproduce the problem using the steps in Johnny's answer.

$ brew install gnu-tar
==> Downloading https://homebrew.bintray.com/bottles/gnu-tar-1.28.yosemite.bottle.2.tar.gz
######################################################################## 100.0%
==> Pouring gnu-tar-1.28.yosemite.bottle.2.tar.gz
==> Caveats
gnu-tar has been installed as "gtar".

If you really need to use it as "tar", you can add a "gnubin" directory
to your PATH from your bashrc like:

    PATH="/usr/local/opt/gnu-tar/libexec/gnubin:$PATH"
==> Summary
🍺  /usr/local/Cellar/gnu-tar/1.28: 13 files, 1.6M
$ mkdir test
$ touch test/a test/b
$ gtar -zcvf test.tar.gz test test/a # make the archive with gnutar
test/
test/a
test/b
test/a
$ gtar -ztvf test.tar.gz
drwxr-xr-x adamliter/staff   0 2015-07-28 22:41 test/
-rw-r--r-- adamliter/staff   0 2015-07-28 22:41 test/a
-rw-r--r-- adamliter/staff   0 2015-07-28 22:41 test/b
hrw-r--r-- adamliter/staff   0 2015-07-28 22:41 test/a link to test/a
$ rm -r test
$ tar -xvf test.tar.gz # try to unpack the archive with bsdtar
x test/
x test/a
x test/b
x test/a: Can't create 'test/a'
tar: Error exit delayed from previous errors.
$ echo $?
1

So obviously gnutar archives things differently in a way that causes bsdtar to choke on duplicates. The fact that gtar -ztvf test.tar.gz indciates that the second instance of test/a is archived as a link to test/a is relevant. As Johnny points out in the comments, gnutar will store duplicates as hard links rather than the actual file, which can be disabled with --hard-dereference.

That is, you could do the following:

$ mkdir test
$ touch test/a test/b
$ gtar -zcvf test.tar.gz test test/a --hard-dereference
test/
test/a
test/b
test/a
$ gtar -ztvf test.tar.gz test
drwxr-xr-x adamliter/staff   0 2015-07-28 23:49 test/
-rw-r--r-- adamliter/staff   0 2015-07-28 23:49 test/a
-rw-r--r-- adamliter/staff   0 2015-07-28 23:49 test/b
-rw-r--r-- adamliter/staff   0 2015-07-28 23:49 test/a # note that this is no longer a link
$ rm -r test
$ tar -xvf test.tar.gz # unpack with bsdtar
x test/
x test/a
x test/b
x test/a
$ echo $?
0
$ ls test/
a b

However, in this case, you obviously don't control the creation of the tarball, so --hard-dereference is not an option. Luckily, based on the OP's answer, it seems that this problem has been fixed by upstream.

Nonetheless, if anybody else runs into this problem in the future and is in need of a quick fix or has an unresponsive upstream maintainer, there is a workaround.

Once you identify what the duplicate file is, you can use the --fast-read option of bsdtar (note that this option is only part of bsdtar, not gnutar):

 -q (--fast-read)
         (x and t mode only) Extract or list only the first archive entry that matches each pattern or filename operand.  Exit as soon as each specified pat-
         tern or filename has been matched.  By default, the archive is always read to the very end, since there can be multiple entries with the same name
         and, by convention, later entries overwrite earlier entries.  This option is provided as a performance optimization.

So, in the toy example that I've created following the toy example in Johnny's answer, the duplicate file is test/a. Thus, you could avoid this problem by doing the following:

# this set of commands picks up from the first set of commands
# i.e., the following assumes a tarball that was *not* made with
# the --hard-dereference option, although this will work just as well
# with one that was
$ tar -xvqf test.tar.gz test/a # unarchive the first instance of test/a
x test/a
$ tar -xvf test.tar.gz --exclude test/a # unarchive everything except test/a
x test/
x test/b
$ echo $?
0
$ ls test/
a b

Note, moreover, that gnutar is perfectly happy to unpack an archive with duplicates that was created by itself, even when the --hard-dereference option was not used:

$ rm -r test
$ gtar -xvf test.tar.gz
test/
test/a
test/b
test/a
$ echo $?
0
$ ls test/
a b

So this answers your question of why an error is thrown on Mac but not Linux. (Most) Linux distros ship with gnutar, and since the tarball was presumably packaged with gnutar, there will be no error when unpacking with gnutar, but there will be an error when unpacking with bsdtar.


For further reading and reference, one might want to look at What are the differences between bsdtar and GNU tar? on Unix.SE.

Solution 2:

The existence of a duplicate file in the archive should not make it invalid or unable to be extracted on OSX, as by default, tar overwrites duplicates.

So, I'm a little confused by the behavior in your Gist - OSX tar allows for duplicate files in an archive (a throwback to its original purpose as a tape archive utility, so it allows files to be appended to the end of the tape archive, and when the archive is restored the newest version of the file will overwrite the older version(s))

It's only when the "-k" option is present that tar should warn about preexisting files.

Here I created an archive with a duplicate file then extracted it with no problem. It wasn't until I added the -k option that it warned me about the duplicate file:

Macbook> tar --version
bsdtar 2.8.3 - libarchive 2.8.3
Macbook> mkdir test
Macbook> touch test/a test/b
Macbook> tar -zcvf test.tar.gz test test/a
a test
a test/a
a test/b
a test/a
Macbook> tar -ztvf test.tar.gz
drwxr-xr-x  0 user group       0 Jul 28 10:42 test/
-rw-r--r--  0 user group       0 Jul 28 10:42 test/a
-rw-r--r--  0 user group       0 Jul 28 10:42 test/b
-rw-r--r--  0 user group       0 Jul 28 10:42 test/a
Macbook> rm -r test
Macbook> tar -xvf test.tar.gz
x test/
x test/a
x test/b
x test/a
Macbook> echo $?
0
Macbook> rm -r test
Macbook> tar -k -xvf test.tar.gz
x test/
x test/a
x test/b
x test/a: Already exists
tar: Error exit delayed from previous errors.
Macbook> echo $?
1

A simple umask problem doesn't seem to be the culprit either, I tried changing my umask to 0777 and I can still extract the archive:

Macbook> tar -xvf test.tar
x test/
x test/a
x test/b
x test/a
Macbook> ls -l test
ls: test: Permission denied
Macbook> sudo ls -l test
total 0
----------  1 someuser  wheel  0 Jul 28 13:48 a
----------  1 someuser  wheel  0 Jul 28 13:48 b

I thought I could duplicate the problem by deliberately appending an unwritable directory to the archive, but that didn't work, tar didn't update the permissions on the directory when it extracted the archive:

Macbook> mkdir -p testdir1/test testdir2/test
Macbook> touch testdir1/test/{a,b} testdir2/test/a
Macbook> chmod -w testdir2/test
Macbook> touch testdir2/test/b
touch: testdir2/test/b: Permission denied
Macbook> find testdir* -ls  | awk '{print $3, $11}'
drwxrwx--- testdir1
drwxrwx--- testdir1/test
-rw-rw---- testdir1/test/a
-rw-rw---- testdir1/test/b
drwxrwx--- testdir2
dr-xr-x--- testdir2/test
-rw-rw---- testdir2/test/a
Macbook> cd testdir1
Macbook> tar -cvf ../test.tar test/*
a test/a
a test/b
Macbook> cd ../testdir2
Macbook> tar -rvf ../test.tar test
a test
a test/a
Macbook> cd ..
Macbook> tar -tvf ./test.tar
-rw-rw----  0 username groupname       0 Jul 28 15:40 test/a
-rw-rw----  0 username groupname       0 Jul 28 15:40 test/b
-rw-rw----  0 username groupname       0 Jul 28 15:40 test/a
dr-xr-x---  0 username groupname       0 Jul 28 15:40 test/
-rw-rw----  0 username groupname       0 Jul 28 15:40 test/a
Macbook> tar -xvf test.tar
x test/a
x test/b
x test/a
x test/
x test/a
Macbook> 

I also tried changing the permissions on test/a to 000, appending it to the archive, then appending another test/a, but that one worked fine too:

drwxrwx---  0 username groupname       0 Jul 28 15:40 test/
-rw-rw----  0 username groupname       0 Jul 28 15:40 test/a
-rw-rw----  0 username groupname       0 Jul 28 15:40 test/b
dr-xr-x---  0 username groupname       0 Jul 28 15:40 test/
----------  0 username groupname       0 Jul 28 15:40 test/a
-rw-rw----  0 username groupname       0 Jul 28 15:40 test/a

So I'd really like to see the original archive that caused the problem and see what could have been in that archive to cause this problem.

If a filename and directory share the same name, tar does have a problem extracting, but it has a pretty clear error message:

Macbook> tar -xvf test.tar
x test/
x test/dir1/
x test/dir1/a
x test/
x test/dir1: Can't remove already-existing dir
tar: Error exit delayed from previous errors.

(if the conflict happened the other way around, i.e. a file came first, then a directory with the same name came later, tar just removes it and creates the directory:

Macbook> tar -xvf test.tar
x test/
x test/dir1
x test/
x test/dir1/
x test/dir1/a

Solution 3:

Turns out the OS X tar utility was the correct one! There was indeed an error in the archive. This email thread discusses it in more detail, but the problem is that there is a duplicate file in the archive. The SCIP guys are fixing the archive as I type this.

[edit]
The newly updated scip-3.2.0.tgz is now extracting just fine! The SHA-1 hash of the new tgz is 5b4e8283f4a5bf9e50f9a62d4320d6f5f50c8476.

[edit 2]
It's not that there's an error in the archive. It's simply that bsdtar, which ships with OS X, handles duplicate files differently than gnutar, which ships with Linux. @Adam Liter's answer here provides a thorough explanation of what's happening.

Solution 4:

There's an alternative, free, lightweight archive software that I use for Mac OSX. It's called Keka and I use it to unpack 7zip most specifically. Moreover, it can unpack other types like .rar, .tar, .gz etc. It worked for the OP's specific tar file as well, but I attempted it after @Geoff mentioned the team was working on repairing the file.