Undo tar file extraction mess
I just untar'd an archive that produced a mess of files into my tidy directory. For example:
user@comp:~/tidy$ tar xvf myarchive.tar
file1
file2
dir1/
dir1/file1
dir1/subdir1/
dir1/subdir1/file1
dir2/
dir2/file1
...
I was expecting that the tar file would have been organized in a single folder (i.e., myarchive/
), but it wasn't! Now I have some 190 files and directories that have digitally barfed in what was an organized directory. These untar'd files need to be cleaned up.
Is there any way to "undo" this and delete the files and directories that were extracted from this archive?
Thanks for the excellent answers below. In summary, here is what works with two steps (1) delete files, and (2) delete empty directory structure in reverse packing order (to delete outer directories first):
tar tf myarchive.tar | xargs -d'\n' rm
tar tf myarchive.tar | tac | xargs -d'\n' rmdir
And safer yet, to preview a dry-run of the commands by appending echo
after xargs
.
Solution 1:
tar tf archive.tar
will list the contents line by line.
This can be piped to xargs
directly, but beware: do the deletion very carefully. You don't want to just rm -r
everything that tar tf
tells you, since it might include directories that were not empty before unpacking!
You could do
tar tf archive.tar | xargs -d'\n' rm -v
tar tf archive.tar | sort -r | xargs -d'\n' rmdir -v
to first remove all files that were in the archive, and then the directories that are left empty.
sort -r
(glennjackman suggested tac
instead of sort -r
in the comments to the accepted answer, which also works since tar
's output is regular enough) is needed to delete the deepest directories first; otherwise a case where dir1
contains a single empty directory dir2
will leave dir1
after the rmdir
pass, since it was not empty before dir2
was removed.
This will generate a lot of
rm: cannot remove `dir/': Is a directory
and
rmdir: failed to remove `dir/': Directory not empty
rmdir: failed to remove `file': Not a directory
Shut this up with 2>/dev/null
if it annoys you, but I'd prefer to keep as much information on the process as possible.
And don't do it until you are sure that you match the right files. And perhaps try rm -i
to confirm everything. And have backups, eat your breakfast, brush your teeth, etc.
Solution 2:
List the contents of the tar file like so:
tar tzf myarchive.tar
Then, delete those file names by iterating over that list:
while IFS= read -r file; do echo "$file"; done < <(tar tzf myarchive.tar.gz)
This will still just list the files that would be deleted. Replace echo
with rm
if you're really sure these are the ones you want to remove. And maybe make a backup to be sure.
In a second pass, remove the directories that are left over:
while IFS= read -r file; do rmdir "$file"; done < <(tar tzf myarchive.tar.gz)
This prevents directories with from being deleted if they already existed before.
Another nice trick by @glennjackman, which preserves the order of files, starting from the deepest ones. Again, remove echo
when done.
tar tvf myarchive.tar | tac | xargs -d'\n' echo rm
This could then be followed by the normal rmdir
cleanup.
Solution 3:
Here's a possibility that will take the extracted files and move them to a subdirectory, cleaning up your main folder.
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
my $clean_folder = "clean";
my $DRY_RUN;
die "Usage: $0 [--dry] [--clean=dir-name]\n"
if ( !GetOptions("dry!" => \$DRY_RUN,
"clean=s" => \$clean_folder));
# Protect the 'clean_folder' string from shell substitution
$clean_folder =~ s/'/'\\''/g;
# Process the "tar tv" listing and output a shell script.
print "#!/bin/sh\n" if ( !$DRY_RUN );
while (<>)
{
chomp;
# Strip out permissions string and the directory entry from the 'tar' list
my $perms = substr($_, 0, 10);
my $dirent = substr($_, 48);
# Drop entries that are in subdirectories
next if ( $dirent =~ m:/.: );
# If we're in "dry run" mode, just list the permissions and the directory
# entries.
#
if ( $DRY_RUN )
{
print "$perms|$dirent\n";
next;
}
# Emit the shell code to clean up the folder
$dirent =~ s/'/'\\''/g;
print "mv -i '$dirent' '$clean_folder'/.\n";
}
Save this to the file fix-tar.pl
and then execute it like this:
$ tar tvf myarchive.tar | perl fix-tar.pl --dry
This will confirm that your tar
list is like mine. You should get output like:
-rw-rw-r--|batch
-rw-rw-r--|book-report.png
-rwx------|CaseReports.png
-rw-rw-r--|caseTree.png
-rw-rw-r--|tree.png
drwxrwxr-x|sample/
If that looks good, then run it again like this:
$ mkdir cleanup
$ tar tvf myarchive.tar | perl fix-tar.pl --clean=cleanup > fixup.sh
The fixup.sh
script will be the shell commands that will move the top-level files and directories into a "clean" folder (in this instance, the folder called cleanup
). Have a peek through this script to confirm that it's all kosher. If it is, you can now clean up your mess with:
$ sh fixup.sh
I prefer this kind of cleanup because it doesn't destroy anything that isn't already destroyed by being overwritten by that initial tar xv
.
Note: if that initial dry run output doesn't look right, you should be able to fiddle with the numbers in the two substr
function calls until they look proper. The $perms
variable is used only for the dry run so really only the $dirent
substring needs to be proper.
One other thing: you may need to use the tar
option --numeric-owner
if the user names and/or group names in the tar
listing make the names start in an unpredictable column.