Need to extract a substring from a file path string including the delimiter
Solution 1:
Besides sed
, you also have the option of using grep
for this, with the PCRE regex ^.*?\.jar
:
grep -oP '^.*?\.jar' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"
This prints only the match (-o
), uses PCRE (-P
), and matches text that:
- starts at the beginning of the line (
^
), and - contains any character (
.
), any number of times but matched lazily (*?
), - followed by a literal
.
character (\.
) andjar
(jar
)
Using the lazy quantifier *?
instead of the usual greedy quantifier *
causes grep
to match the fewest characters possible.
- Without it (and with the greedy quantifier instead),
grep
would match as many characters as possible so long as the match ended in.jar
, which would fail to stop after the first.jar
in cases where there is more than one. - The
-P
flag is required because, of the regex dialectsgrep
supports on Ubuntu, PCRE is the one that supports laziness. (This dialect is very similar to the regex dialect in Perl.)
Solution 2:
You could use sed
like below:
sed 's/\(\.jar\).*/\1/' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"
Or through awk
command:
awk -F'\\.jar' '{print $1".jar"}' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"
The output is:
test1/test2/Test.jar