Need to extract a substring from a file path string including the delimiter

Solution 1:

Besides sed, you also have the option of using grep for this, with the PCRE regex ^.*?\.jar:

grep -oP '^.*?\.jar' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"

This prints only the match (-o), uses PCRE (-P), and matches text that:

  • starts at the beginning of the line (^), and
  • contains any character (.), any number of times but matched lazily (*?),
  • followed by a literal . character (\.) and jar (jar)

Using the lazy quantifier *? instead of the usual greedy quantifier * causes grep to match the fewest characters possible.

  • Without it (and with the greedy quantifier instead), grep would match as many characters as possible so long as the match ended in .jar, which would fail to stop after the first .jar in cases where there is more than one.
  • The -P flag is required because, of the regex dialects grep supports on Ubuntu, PCRE is the one that supports laziness. (This dialect is very similar to the regex dialect in Perl.)

Solution 2:

You could use sed like below:

sed 's/\(\.jar\).*/\1/' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class" 

Or through awk command:

awk -F'\\.jar' '{print $1".jar"}' <<<"test1/test2/Test.jar/Test2.jar/com/test/ui/GI.class"

The output is:

test1/test2/Test.jar