Vim Regex Capture Groups [bau -> byau : ceu -> cyeu]
I have a list of words:
bau
ceu
diu
fou
gau
I want to turn that list into:
byau
cyeu
dyiu
fyou
gyau
I unsuccessfully tried the command:
:%s/(\w)(\w\w)/\1y\2/g
Given that this doesn't work, what do I have to change to make the regex capture groups work in Vim?
One way to fix this is by ensuring the pattern is enclosed by escaped parentheses:
:%s/\(\w\)\(\w\w\)/\1y\2/g
Slightly shorter (and more magic-al) is to use \v
, meaning that in the pattern after it all ASCII characters except '0'-'9'
, 'a'-'z'
, 'A'-'Z'
and '_'
have a special meaning:
:%s/\v(\w)(\w\w)/\1y\2/g
See:
:help \(
:help \v
If you don't want to escape the capturing groups with backslashes (this is what you've missed), prepend \v
to turn Vim's regular expression engine into very magic mode:
:%s/\v(\w)(\w\w)/\1y\2/g
You can also use this pattern which is shorter:
:%s/^./&y
-
%s
applies the pattern to the whole file. -
^.
matches the first character of the line. -
&y
adds they
after the pattern.
You also have to escape the Grouping paranthesis:
:%s/\(\w\)\(\w\w\)/\1y\2/g
That does the trick.
Very nice! On a selection, use the following (for example):
:'<,'>s/^\(\w\+ - \w\+\).*/\1/
or
:'<,'>s/\v^(\w+ - \w+).*/\1/
which parses Space - Commercial - Boeing
to Space - Commercial
.
Explanation:
-
^
: match start of line -
\
-escape(
,+
,)
per the first regex (accepted answer) -- or prepend with\v
(@ingo-karkat's answer) -
\w\+
finds a word (\w
will find the first character): in this example, I search for a word followed by-
followed by another word) -
.*
after the capturing group is needed to find / match / exclude the remaining text
Addendum. This is a bit off topic, but I would suggest that Vim is not well-suited for the execution of more complex regex expressions / captures. [I am doing something similar to the following, which is how I found this thread.]
In those instances, it is likely better to dump the lines to a text file and edit it "in place" (sed -i ...
) or in a redirect (sed ... > out.txt
).
echo 'Space Sciences - Private Industry - Boeing' | sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/'
Space Sciences - Private Industry
touch ~/in.txt
touch ~/out.txt
echo 'Space Sciences - Private Industry - Boeing' > ~/in.txt
cat in.txt
Space Sciences - Private Industry - Boeing
sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/' ~/in.txt > ~/out.txt
cat ~/out.txt
Space Sciences - Private Industry
## Caution: if you forget the > redirect, you'll edit your source.
## source unaltered:
cat in.txt
Space Sciences - Private Industry - Boeing
## edit in place:
sed -i -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/' ~/in.txt
cat in.txt
Space Sciences - Private Industry
That expression, sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/'
, allows the flexibility of finding {x,y}
repetitions of a word(s) -- see https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html . Here, since my phrases are separated by -
, I can simply tweak those parameters to get what I want.