Remove first N lines of a file in place in unix command line
I'm trying to remove the first 37 lines from a very, very large file. I started trying sed and awk, but they seem to require copying the data to a new file. I'm looking for a "remove lines in place" method, that unlike sed -i
is not making copies of any kind, but rather is just removing lines from the existing file.
Here's what I've done...
awk 'NR > 37' file.xml > 'f2.xml'
sed -i '1,37d' file.xml
Both of these seem to do a full copy. Is there any other simple CLI that can do this quickly without a full document traversal?
There's no simple way to do inplace editing using UNIX utilities, but here's one inplace file modification solution that you might be able to modify to work for you (courtesy of Robert Bonomi at https://groups.google.com/forum/#!topic/comp.unix.shell/5PRRZIP0v64):
bytes=$(head -37 "$file" |wc -c)
dd if="$file" bs="$bytes" skip=1 conv=notrunc of="$file"
The final file should be $bytes
bytes smaller than the original (since the goal was to remove $bytes
bytes from the beginning), so to finish we must remove the final $bytes
bytes. We're using conv=notrunc
above to make sure that the file doesn't get completely emptied rather than just truncated (see below for example). On a GNU system such as Linux doing the truncation afterwards can be accomplished by:
truncate -s "-$bytes" "$file"
For example to delete the first 5 lines from this 12-line file
$ wc -l file
12 file
$ cat file
When chapman billies leave the street,
And drouthy neibors, neibors, meet;
As market days are wearing late,
And folk begin to tak the gate,
While we sit bousing at the nappy,
An' getting fou and unco happy,
We think na on the lang Scots miles,
The mosses, waters, slaps and stiles,
That lie between us and our hame,
Where sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
First use dd
to remove the target 5 lines (really "$bytes" bytes) from the start of the file and copy the rest from the end to the front but leave the trailing "$bytes" bytes as-is:
$ bytes=$(head -5 file |wc -c)
$ dd if=file bs="$bytes" skip=1 conv=notrunc of=file
1+1 records in
1+1 records out
253 bytes copied, 0.0038458 s, 65.8 kB/s
$ wc -l file
12 file
$ cat file
An' getting fou and unco happy,
We think na on the lang Scots miles,
The mosses, waters, slaps and stiles,
That lie between us and our hame,
Where sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
s, waters, slaps and stiles,
That lie between us and our hame,
Where sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
and then use truncate
to remove those leftover bytes from the end:
$ truncate -s "-$bytes" "file"
$ wc -l file
7 file
$ cat file
An' getting fou and unco happy,
We think na on the lang Scots miles,
The mosses, waters, slaps and stiles,
That lie between us and our hame,
Where sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
If we had tried the above without dd ... conv=notrunc
:
$ wc -l file
12 file
$ bytes=$(head -5 file |wc -c)
$ dd if=file bs="$bytes" skip=1 of=file
dd: file: cannot skip to specified offset
0+0 records in
0+0 records out
0 bytes copied, 0.0042254 s, 0.0 kB/s
$ wc -l file
0 file
See the google groups thread I referenced for other suggestions and info.
Unix file semantics do not allow truncating the front part of a file.
All solutions will be based on either:
- Reading the file into memory and then writing it back (
ed
,ex
, other editors). This should be fine if your file is <1GB or if you have plenty of RAM. - Writing a second copy and optionally replacing the original (
sed -i
,awk
/tail > foo
). This is fine as long as you have enough free diskspace for a copy, and don't mind the wait.
If the file is too large for any of these to work for you, you may be able to work around it depending on what's reading your file.
Perhaps your reader skips comments or blank lines? If so, you can then craft a message the reader ignores, make sure it has the same number of bytes as the 37 first lines in your file, and overwrite the start of the file with dd if=yourdata of=file conv=notrunc
.
ed is the standard editor:
ed -s file <<< $'1,37d\nwq'
The copy will have to be created at some point - why not at the time of reading the "modified" file; streaming the altered copy instead of storing it?
What I'm thinking - create a named pipe "file2" that is the output of that same awk 'NR > 37' file.xml or whatever; then whoever reads file2 will not see the first 37 lines.
The drawback is that it will run awk each time the file is processed, so it's feasible only if it's read rarely.