How to use split file into two based on line start?

I have this file 1.txt:

-e a
b
-e c

d
-e e
f

I want to split it into the following two files.

2.txt

-e a
-e c
-e e

3.txt

b
d
f

where 2.txt contains all the lines starting with -e, and 3.txt contains all the other lines. Extra newlines (such as the extra newline in the middle of the original) can be ignored or kept, and order doesn't matter.

I've tried using split, but it doesn't look like that allows me to use a pattern for splitting (instead a fixed number of lines per split file).

Using grep:

grep -E '^-e' 1.txt >2.txt
grep -E '[^-]' 1.txt >3.txt

@braemar: Using grep -v with the same regexp would erroneously detect blank lines, text lines, etc. Not what was wanted.

Here is awk solution:

awk '{ if ( /^-/ ) print > "2.txt"; else if ( NF ) print > "3.txt" }' 1.txt

A performance test:

$ cat 1.txt | wc -l | sed -r -e 's/([0-9]{6}$)/ \1/' -e 's/([0-9]{3}$)/ \1 lines/'
1 144 270 lines
$ TIMEFORMAT=%R

$ time awk '{ if ( /^-/ ) print > "2.txt"; else if ( NF ) print > "3.txt" }' 1.txt
0.372

Preserving empty lines:

$ sed -n -e '/^-e/{w 2.txt' -e 'd}' -e 'w 3.txt' 1.txt

giving

$ head {1,2,3}.txt
==> 1.txt <==
-e a
b
-e c

d
-e e
f

==> 2.txt <==
-e a
-e c
-e e

==> 3.txt <==
b

d
f

If you prefer to omit empty lines, then add an "any character" regex to the last write:

sed -n -e '/^-e/{w 2.txt' -e 'd}' -e '/./w 3.txt' 1.txt

Here is a sed solution by using of delete flag:

sed -e '/^-/!d' -e '/^[[:space:]]*$/d' 1.txt > 2.txt

The above command has two regex, the first '/^-/!d' will match to all lines that doesn't start with - and they will be deleted from the output, the second '/^[[:space:]]*$/d' will match to all lines that contains only white spaces and they will be deleted from the output.

sed -e '/^-/d' -e '/^[[:space:]]*$/d' 1.txt > 3.txt

The above command also has two regex, the first '/^-/d' will match to all lines that starts with - and they will be deleted from the output, the second is the same as in the previews case.

Another way is to preserve -n the normal output of sed and then print only the matched lines:

sed -n '/^-/p' 1.txt > 2.txt

sed -n -r '/^(-|[[:space:]]*$)/!p' 1.txt > 3.txt

Here is a performance test:

$ cat 1.txt | wc -l | sed -r -e 's/([0-9]{6}$)/ \1/' -e 's/([0-9]{3}$)/ \1 lines/'
1 144 270 lines
$ TIMEFORMAT=%R

$ time sed -e '/^-/!d' -e '/^[[:space:]]*$/d' 1.txt > 2.txt
0.357
$ time sed -e '/^-/d' -e '/^[[:space:]]*$/d' 1.txt > 3.txt
0.323

$ time sed -n '/^-/p' 1.txt > 2.txt
0.221
$ time sed -n -r '/^(-|[[:space:]]*$)/!p' 1.txt > 3.txt
0.402

How to use split file into two based on line start?

Related

Recent Posts