Problems with "+" in grep
If you want +
to mean "one or more of the preceding atom", then you have to do one of:
-
Use
-E
(Extended Regular Expressions) (or-P
, PCRE):grep -E 'data=[a-z,0-9,\"]+' file
-
Escape
+
so that is treated specially in the Basic Regular Expressions used by default ingrep
:grep 'data=[a-z,0-9,"]\+' file
Points:
+
is an ERE (Extended Regular Expression) token, which indicates one or more of the preceding token, can be used if-E
option ofgrep
is used or with escaped(\+
) in case of BRE (Basic Regex) i.e. only regulargrep
The character class
[a-z,0-9,\"]
would match any of the characters between[a-z]
,[0-9]
,,
or"
. This may not be what you wantNormally
grep
outputs whole line, if you want to output only the matched portion, use-o
option ofgrep
Based on your example, you can do:
grep -E '\bdata=[a-z0-9"]+\b' file
-
-E
enables ERE -
\b
matches string edges, zero width -
data=
matchesdata=
literally -
[a-z0-9"]
matches any character of[a-z]
,[0-9]
, and"
.+
matches the previous token one or more times
Your current pattern even you make it correct, without \b
this would match false positives like foo fdata=2322ab
, data=12AB
and so on.
Example:
% grep -oE '\bdata=[a-z0-9"]+\b' <<<'<div class="node_thumbnail" data-type="file" name="GOPR0036.MP4_frame000001.jpg" data="813334c25191468c9f1c57afc99fde60" aid="133948" rel="/Files/ToolTipView?fileId=813334c25191468c9f1c57afc99fde60&pageNo=1&NoCache=101016083044" rev="topMiddle"'
data="813334c25191468c9f1c57afc99fde60