Export all regular expression matches in Textpad or Notepad++ as a list
In Textpad or Notepad++ is there an option to export all the matches for a regular expression find, as a single list?
In a big text file, I am searching for tags (words enclosed in % %), using regular expression %\< and \>%
, and want all the matches as a single list, so that I can remove duplicates using Excel and get a list of unique tags.
You can achieve this by using Backreferences and Find and Mark functionality in Notepad++.
Find the matches using regex (say
%(.*?)%
) and replace it by\n%\1%\n
, after this we will have our target word in separate lines (i.e. no line will have more than one matched word)Use the Search-->Find-->Mark functionality to mark each line with regex
%(.*?)%
and remember to tick 'Bookmark Line' before marking the text- Select Search-->Bookmark-->Remove Unmarked Lines
- Save the remaining text. It is the required list.
Is doing this in Notepad++ a mandatory requirement? Are you on Windows or some form of Unix? If you’re on Windows, you can do it (partly) from the Command Prompt:
findstr /r "%[a-z].*[a-z]% %[a-z]%" your_file > new_file
findstr
is vaguely inspired by grep
, so this new_file
will contain all lines matching your search criteria; you can then use Notepad++ to strip out the unwanted text (to the left of the first % and to the right of the second one).
And, of course, if you’re on Unix,
you can do the equivalent task with sed
.
And if you have GNU grep
(i.e., if you’re on Linux),
you can do it with grep -o
.
There is a Notepad++ plugin which can copy matched regex expression to new file in new tab. RegexExtract
Because I didn't find any plugin for Notepad++ that can extract some text from current document or all files from a location with some additional settings (like case conversion), I decided to try to make it myself. (...) Plugin interface is pretty straightforward (...). (...) "Find", "Replace" and "Mask" fields use C++11 regex syntax. Extracting from files works right now only for those in UTF8.
Edit Dialog input tailored to the question
In the image you can see how to fill in the dialog. I assume that a word does not contain spaces, etc., only characters matched by \w. Notably:
- Use a pair of brackets, to allow selecting the word, without the percetange characters.
- Choose option Extract with replace, to select the first match. Otherwise, you will get a columnar output of all $1, $2, etc.
- Check Skip $& ... to leave out the complete matches.
- Check Filter unique to report each match only once.
- Click Extract to select get results. (Search only finds the matches, but does not report).