How to split a tar file into smaller parts at file boundaries?
There is a tool, tarsplitter
which safely splits tar archives. You specify the number of parts you want to split the archive into, and it will figure out where the file boundaries are.
https://github.com/AQUAOSOTech/tarsplitter
The output smaller archives won't be exactly the same size, but pretty close - assuming the files in the original archive don't have a lot of variation.
Example - split the archive "files.tar" into 4 smaller archives:
tarsplitter -p 4 -i files.tar -o /tmp/parts
Creating:
/tmp/parts0.tar
/tmp/parts1.tar
/tmp/parts2.tar
/tmp/parts3.tar
If recreating the archive is an option this Bash script should do the trick (it's just a possible manner):
#!/bin/bash if [ $# != 3 ] ; then echo -e "$0 in out max\n" echo -e "\tin: input directory" echo -e "\tout: output directory" echo -e "\tmax: split size threshold in bytes" exit fi IN=$1 OUT=$2 MAX=$3 SEQ=0 TOT=0 find $IN -type f | while read i ; do du -bs "$i" ; done | sort -n | while read SIZE NAME ; do if [ $TOT != 0 ] && [ $((TOT+SIZE)) -gt $MAX ] ; then SEQ=$((SEQ+1)) TOT=0 fi TOT=$((TOT+SIZE)) TAR=$OUT/$(printf '%08d' $SEQ).tar tar rf $TAR "$NAME" done
It sorts (ascending order) all the files by size and starts creating the archives; it switches to another when the size exceeds the threshold.
NOTE: Make you sure that the output directory is empty.
USE AT YOUR OWN RISK
I don't believe there are any existing tools to do this, but it would be reasonably easy to implement yourself. The tar format is pretty simple, so you'd just have to have a split
that took it into consideration. The basic theory is to read a header, look at the stated length of the incoming file, and determine whether to split now or write out the current file. Read the next header, and repeat.
The tarsplitter
command offered by @ruffrey looks like an awesome option.
I downloaded it, then did:
brew install golang
to be able to compile it. (Hmm...is it already in Homebrew? Nope.) The command successfully compiled on my Mac on 10.14. I'm currently making a copy of my gigantic archive to run tarsplitter
against it. Two thumbs up for the recommendation.
I'm a relative noob when it comes to compiling other people's code, so it would have been helpful if the author made it clear it was written in GO instead of C/C++ and needed a new compiler installed. Also, make install
doesn't work as there's no install in the Makefile
, so I just did:
cp build/tarsplitter_mac /usr/local/bin/tarsplitter
Neat that the GO compiler built for Mac, Linux, and Windows.