Undo tar file extraction mess

I just untar'd an archive that produced a mess of files into my tidy directory. For example:

user@comp:~/tidy$ tar xvf myarchive.tar
file1
file2
dir1/
dir1/file1
dir1/subdir1/
dir1/subdir1/file1
dir2/
dir2/file1
...

I was expecting that the tar file would have been organized in a single folder (i.e., myarchive/), but it wasn't! Now I have some 190 files and directories that have digitally barfed in what was an organized directory. These untar'd files need to be cleaned up.

Is there any way to "undo" this and delete the files and directories that were extracted from this archive?


Thanks for the excellent answers below. In summary, here is what works with two steps (1) delete files, and (2) delete empty directory structure in reverse packing order (to delete outer directories first):

tar tf myarchive.tar | xargs -d'\n' rm
tar tf myarchive.tar | tac | xargs -d'\n' rmdir

And safer yet, to preview a dry-run of the commands by appending echo after xargs.


Solution 1:

tar tf archive.tar

will list the contents line by line.

This can be piped to xargs directly, but beware: do the deletion very carefully. You don't want to just rm -r everything that tar tf tells you, since it might include directories that were not empty before unpacking!

You could do

tar tf archive.tar | xargs -d'\n' rm -v
tar tf archive.tar | sort -r | xargs -d'\n' rmdir -v

to first remove all files that were in the archive, and then the directories that are left empty.

sort -r (glennjackman suggested tac instead of sort -r in the comments to the accepted answer, which also works since tar's output is regular enough) is needed to delete the deepest directories first; otherwise a case where dir1 contains a single empty directory dir2 will leave dir1 after the rmdir pass, since it was not empty before dir2 was removed.

This will generate a lot of

rm: cannot remove `dir/': Is a directory

and

rmdir: failed to remove `dir/': Directory not empty
rmdir: failed to remove `file': Not a directory

Shut this up with 2>/dev/null if it annoys you, but I'd prefer to keep as much information on the process as possible.

And don't do it until you are sure that you match the right files. And perhaps try rm -i to confirm everything. And have backups, eat your breakfast, brush your teeth, etc.

Solution 2:

List the contents of the tar file like so:

tar tzf myarchive.tar

Then, delete those file names by iterating over that list:

while IFS= read -r file; do echo "$file"; done < <(tar tzf myarchive.tar.gz)

This will still just list the files that would be deleted. Replace echo with rm if you're really sure these are the ones you want to remove. And maybe make a backup to be sure.

In a second pass, remove the directories that are left over:

while IFS= read -r file; do rmdir "$file"; done < <(tar tzf myarchive.tar.gz)

This prevents directories with from being deleted if they already existed before.


Another nice trick by @glennjackman, which preserves the order of files, starting from the deepest ones. Again, remove echo when done.

tar tvf myarchive.tar | tac | xargs -d'\n' echo rm

This could then be followed by the normal rmdir cleanup.

Solution 3:

Here's a possibility that will take the extracted files and move them to a subdirectory, cleaning up your main folder.

    #!/usr/bin/perl -w

    use strict;
    use Getopt::Long;

    my $clean_folder = "clean";
    my $DRY_RUN;
    die "Usage: $0 [--dry] [--clean=dir-name]\n"
        if ( !GetOptions("dry!" => \$DRY_RUN,
                         "clean=s" => \$clean_folder));

    # Protect the 'clean_folder' string from shell substitution
    $clean_folder =~ s/'/'\\''/g;

    # Process the "tar tv" listing and output a shell script.
    print "#!/bin/sh\n" if ( !$DRY_RUN );
    while (<>)
    {
        chomp;

        # Strip out permissions string and the directory entry from the 'tar' list
        my $perms = substr($_, 0, 10);
        my $dirent = substr($_, 48);

        # Drop entries that are in subdirectories
        next if ( $dirent =~ m:/.: );

        # If we're in "dry run" mode, just list the permissions and the directory
        # entries.
        #
        if ( $DRY_RUN )
        {
            print "$perms|$dirent\n";
            next;
        }

        # Emit the shell code to clean up the folder
        $dirent =~ s/'/'\\''/g;
        print "mv -i '$dirent' '$clean_folder'/.\n";
    }

Save this to the file fix-tar.pl and then execute it like this:

$ tar tvf myarchive.tar | perl fix-tar.pl --dry

This will confirm that your tar list is like mine. You should get output like:

-rw-rw-r--|batch
-rw-rw-r--|book-report.png
-rwx------|CaseReports.png
-rw-rw-r--|caseTree.png
-rw-rw-r--|tree.png
drwxrwxr-x|sample/

If that looks good, then run it again like this:

$ mkdir cleanup
$ tar tvf myarchive.tar | perl fix-tar.pl --clean=cleanup > fixup.sh

The fixup.sh script will be the shell commands that will move the top-level files and directories into a "clean" folder (in this instance, the folder called cleanup). Have a peek through this script to confirm that it's all kosher. If it is, you can now clean up your mess with:

$ sh fixup.sh

I prefer this kind of cleanup because it doesn't destroy anything that isn't already destroyed by being overwritten by that initial tar xv.

Note: if that initial dry run output doesn't look right, you should be able to fiddle with the numbers in the two substr function calls until they look proper. The $perms variable is used only for the dry run so really only the $dirent substring needs to be proper.

One other thing: you may need to use the tar option --numeric-owner if the user names and/or group names in the tar listing make the names start in an unpredictable column.