How can I shorten a file from the command line?

Solution 1:

Assuming you want to truncate and extract the first 1 GB of the 150 GB file:

With head:

head -c 1G infile > outfile

Note that the G suffix can be replaced with GB to align to 1000 instead of 1024.

Or with dd:

dd if=infile of=outfile bs=1M count=1024

Or as in Wumpus Q. Wumbley's answer, dd can truncate in place.

Solution 2:

To truncate a file to 1 gigabyte, use the truncate command:

truncate -s 1G file.xml

The result of truncation will likely not be a valid XML file but I gather that you understand that.

Documentation for the GNU version of truncate is here and documentation for the BSD version is here

Solution 3:

Where possible, I'd use the truncate command as in John1024's answer. It's not a standard unix command, though, so you might some day find yourself unable to use it. In that case, dd can do an in-place truncation too.

dd's default behavior is to truncate the output file at the point where the copying ends, so you just give it a 0-length input file and tell it to start writing at the desired truncation point:

dd if=/dev/null of=filename bs=1048576 seek=1024

(This is not the same as the copy-and-truncate dd in multithr3at3d's answer.)

Note that I used 1048576 and 1024 because 1048576*1024 is the desired size. I avoided bs=1m because this is a "portability" answer, and classic dd only knows suffixes k, b, and w.

Solution 4:

I'm not entirely sure what you are asking. Do you just want to get rid of the other 149GB or are you trying to compress 150GB into 1 GB? Regardless, this may be a useful method to accomplish this.

The split command can split any file into multiple pieces. See man split. You can specify the size of the file chunks you want to split it into with the -b option. For instance:

$ split -b 1GB myfile.xml

Without any other options this should create several files in the current directory starting with the letter x. If you want to adjust the names of the split files refer to the man page.

To re-assemble the file just use cat * > re-assembled.xml.

Example:

[kent_x86.py@c7 split-test]$ ls -l opendocman*
-rw-rw-r--.  1 kent_x86.py kent_x86.py 2082602 Mar 31  2017 opendocman-1.3.5.tar.gz

[kent_x86.py@c7 split-test]$ split -b 100K opendocman-1.3.5.tar.gz 
[kent_x86.py@c7 split-test]$ ls
opendocman-1.3.5.tar.gz  xaa  xab  xac  xad  xae  xaf  xag  xah  xai  xaj  xak  xal  xam  xan  xao  xap  xaq  xar  xas  xat  xau
[kent_x86.py@c7 split-test]$ ll
total 4072
-rw-rw-r--. 1 kent_x86.py kent_x86.py 2082602 Jan  5 11:06 opendocman-1.3.5.tar.gz
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xaa
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xab
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xac
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xad
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xae
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xaf
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xag
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xah
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xai
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xaj
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xak
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xal
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xam
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xan
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xao
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xap
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xaq
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xar
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xas
-rw-rw-r--. 1 kent_x86.py kent_x86.py  102400 Jan  5 11:06 xat
-rw-rw-r--. 1 kent_x86.py kent_x86.py   34602 Jan  5 11:06 xau
[kent_x86.py@c7 split-test]$ cat xa* > opendoc-reassembled.tar.gz
[kent_x86.py@c7 split-test]$ ls -l opendoc-reassembled*
-rw-rw-r--. 1 kent_x86.py kent_x86.py 2082602 Jan  5 11:07 opendoc-reassembled.tar.gz