How to use split file into two based on line start?
I have this file 1.txt
:
-e a
b
-e c
d
-e e
f
I want to split it into the following two files.
2.txt
-e a
-e c
-e e
3.txt
b
d
f
where 2.txt
contains all the lines starting with -e
, and 3.txt
contains all the other lines. Extra newlines (such as the extra newline in the middle of the original) can be ignored or kept, and order doesn't matter.
I've tried using split
, but it doesn't look like that allows me to use a pattern for splitting (instead a fixed number of lines per split file).
Using grep
:
grep -E '^-e' 1.txt >2.txt
grep -E '[^-]' 1.txt >3.txt
@braemar: Using grep -v
with the same regexp would erroneously detect blank lines, text lines, etc. Not what was wanted.
Here is awk
solution:
awk '{ if ( /^-/ ) print > "2.txt"; else if ( NF ) print > "3.txt" }' 1.txt
A performance test:
$ cat 1.txt | wc -l | sed -r -e 's/([0-9]{6}$)/ \1/' -e 's/([0-9]{3}$)/ \1 lines/'
1 144 270 lines
$ TIMEFORMAT=%R
$ time awk '{ if ( /^-/ ) print > "2.txt"; else if ( NF ) print > "3.txt" }' 1.txt
0.372
Preserving empty lines:
$ sed -n -e '/^-e/{w 2.txt' -e 'd}' -e 'w 3.txt' 1.txt
giving
$ head {1,2,3}.txt
==> 1.txt <==
-e a
b
-e c
d
-e e
f
==> 2.txt <==
-e a
-e c
-e e
==> 3.txt <==
b
d
f
If you prefer to omit empty lines, then add an "any character" regex to the last write:
sed -n -e '/^-e/{w 2.txt' -e 'd}' -e '/./w 3.txt' 1.txt
Here is a sed
solution by using of d
elete flag:
sed -e '/^-/!d' -e '/^[[:space:]]*$/d' 1.txt > 2.txt
The above command has two regex, the first '/^-/!d'
will match to all lines that doesn't start with -
and they will be deleted from the output, the second '/^[[:space:]]*$/d'
will match to all lines that contains only white spaces and they will be deleted from the output.
sed -e '/^-/d' -e '/^[[:space:]]*$/d' 1.txt > 3.txt
The above command also has two regex, the first '/^-/d'
will match to all lines that starts with -
and they will be deleted from the output, the second is the same as in the previews case.
Another way is to preserve -n
the normal output of sed
and then p
rint only the matched lines:
sed -n '/^-/p' 1.txt > 2.txt
sed -n -r '/^(-|[[:space:]]*$)/!p' 1.txt > 3.txt
Here is a performance test:
$ cat 1.txt | wc -l | sed -r -e 's/([0-9]{6}$)/ \1/' -e 's/([0-9]{3}$)/ \1 lines/'
1 144 270 lines
$ TIMEFORMAT=%R
$ time sed -e '/^-/!d' -e '/^[[:space:]]*$/d' 1.txt > 2.txt
0.357
$ time sed -e '/^-/d' -e '/^[[:space:]]*$/d' 1.txt > 3.txt
0.323
$ time sed -n '/^-/p' 1.txt > 2.txt
0.221
$ time sed -n -r '/^(-|[[:space:]]*$)/!p' 1.txt > 3.txt
0.402