Vim Regex Capture Groups [bau -> byau : ceu -> cyeu]

I have a list of words:

bau
ceu
diu
fou
gau

I want to turn that list into:

byau
cyeu
dyiu
fyou
gyau

I unsuccessfully tried the command:

:%s/(\w)(\w\w)/\1y\2/g

Given that this doesn't work, what do I have to change to make the regex capture groups work in Vim?


One way to fix this is by ensuring the pattern is enclosed by escaped parentheses:

:%s/\(\w\)\(\w\w\)/\1y\2/g

Slightly shorter (and more magic-al) is to use \v, meaning that in the pattern after it all ASCII characters except '0'-'9', 'a'-'z', 'A'-'Z' and '_' have a special meaning:

:%s/\v(\w)(\w\w)/\1y\2/g

See:

  • :help \(
  • :help \v

If you don't want to escape the capturing groups with backslashes (this is what you've missed), prepend \v to turn Vim's regular expression engine into very magic mode:

:%s/\v(\w)(\w\w)/\1y\2/g

You can also use this pattern which is shorter:

:%s/^./&y
  • %s applies the pattern to the whole file.
  • ^. matches the first character of the line.
  • &y adds the y after the pattern.

You also have to escape the Grouping paranthesis:

:%s/\(\w\)\(\w\w\)/\1y\2/g

That does the trick.


Very nice! On a selection, use the following (for example):

:'<,'>s/^\(\w\+ - \w\+\).*/\1/

or

:'<,'>s/\v^(\w+ - \w+).*/\1/

which parses Space - Commercial - Boeing to Space - Commercial.

Explanation:

  • ^ : match start of line
  • \-escape (, +, ) per the first regex (accepted answer) -- or prepend with \v (@ingo-karkat's answer)
  • \w\+ finds a word (\w will find the first character): in this example, I search for a word followed by - followed by another word)
  • .* after the capturing group is needed to find / match / exclude the remaining text

Addendum. This is a bit off topic, but I would suggest that Vim is not well-suited for the execution of more complex regex expressions / captures. [I am doing something similar to the following, which is how I found this thread.]

In those instances, it is likely better to dump the lines to a text file and edit it "in place" (sed -i ...) or in a redirect (sed ... > out.txt).

echo 'Space Sciences - Private Industry - Boeing' | sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/'
Space Sciences - Private Industry 

touch ~/in.txt
touch ~/out.txt

echo 'Space Sciences - Private Industry - Boeing' > ~/in.txt
cat in.txt
Space Sciences - Private Industry - Boeing

sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/' ~/in.txt > ~/out.txt
cat ~/out.txt 
Space Sciences - Private Industry
## Caution: if you forget the > redirect, you'll edit your source.

## source unaltered:
cat in.txt
Space Sciences - Private Industry - Boeing

## edit in place:
sed -i -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/' ~/in.txt
cat in.txt
Space Sciences - Private Industry 

That expression, sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/', allows the flexibility of finding {x,y} repetitions of a word(s) -- see https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html . Here, since my phrases are separated by -, I can simply tweak those parameters to get what I want.