Enable grep to exactly match the regular expression

Enable command 'grep' the return the regular expression matched exactly.

Command grep will print a line when the line contains a string that matches an expression, which not handy to search specified content. For instance, I have vocabulary files with formatting

    **word**
    1. Definition:
    2. Usage
    3. Others

I'd like to retrieve all the words to make a wordlist within files

    grep '\*\*[^*]*\*\*'

It return bulks of content.

How to enable grep to catch only the 'word' ?


Use awk.

This command will "extract" a bulk list of words assuming it's in the format you specified above:

awk '/\*\*/,/\*\*/ {print substr($0, 3, length($0)-4)}' <filename>

Example:

For this example, assume we have a text file called words.txt with the following content:

**test**
1. Definition:
2. Usage
3. Others

**foo**
1. Definition:
2. Usage
3. Others

**bar**
1. Definition:
2. Usage
3. Others


$ awk '/\*\*/,/\*\*/ {print substr($0, 3, length($0)-4)}' words.txt

test
foo
bar

What it's Doing

  • /\*\*/,/\*\*/ This is the pattern range. I could have done this by looking for the first set of asterisks (/\*\*) and been done, but I used a full range for completeness. One method is no more "right" than the other.

  • {print substr($0, 3, length($0)-4)}' This prints the subsring (of the string **word**) starting at the 3rd character, with a length of the whole string (length($0)) minus four characters (the four asterisks).

  • <filename> This is the input file to process the awk command