Using Java to find substring of a bigger string using Regular Expression

Solution 1:

You should be able to use non-greedy quantifiers, specifically *?. You're going to probably want the following:

Pattern MY_PATTERN = Pattern.compile("\\[(.*?)\\]");

This will give you a pattern that will match your string and put the text within the square brackets in the first group. Have a look at the Pattern API Documentation for more information.

To extract the string, you could use something like the following:

Matcher m = MY_PATTERN.matcher("FOO[BAR]");
while (m.find()) {
    String s = m.group(1);
    // s now contains "BAR"
}

Solution 2:

the non-regex way:

String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf("["),input.indexOf("]"));

alternatively, for slightly better performance/memory usage (thanks Hosam):

String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf('['),input.lastIndexOf(']'));

Solution 3:

This is a working example :

RegexpExample.java

package org.regexp.replace;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexpExample
{
    public static void main(String[] args)
    {
        String string = "var1[value1], var2[value2], var3[value3]";
        Pattern pattern = Pattern.compile("(\\[)(.*?)(\\])");
        Matcher matcher = pattern.matcher(string);

        List<String> listMatches = new ArrayList<String>();

        while(matcher.find())
        {
            listMatches.add(matcher.group(2));
        }

        for(String s : listMatches)
        {
            System.out.println(s);
        }
    }
}

It displays :

value1
value2
value3

Solution 4:

import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public static String get_match(String s, String p) {
    // returns first match of p in s for first group in regular expression 
    Matcher m = Pattern.compile(p).matcher(s);
    return m.find() ? m.group(1) : "";
}

get_match("FOO[BAR]", "\\[(.*?)\\]")  // returns "BAR"

public static List<String> get_matches(String s, String p) {
    // returns all matches of p in s for first group in regular expression 
    List<String> matches = new ArrayList<String>();
    Matcher m = Pattern.compile(p).matcher(s);
    while(m.find()) {
        matches.add(m.group(1));
    }
    return matches;
}

get_matches("FOO[BAR] FOO[CAT]", "\\[(.*?)\\]")) // returns [BAR, CAT]

Solution 5:

If you simply need to get whatever is between [], the you can use \[([^\]]*)\] like this:

Pattern regex = Pattern.compile("\\[([^\\]]*)\\]");
Matcher m = regex.matcher(str);
if (m.find()) {
    result = m.group();
}

If you need it to be of the form identifier + [ + content + ] then you can limit extracting the content only when the identifier is a alphanumerical:

[a-zA-Z][a-z-A-Z0-9_]*\s*\[([^\]]*)\]

This will validate things like Foo [Bar], or myDevice_123["input"] for instance.

Main issue

The main problem is when you want to extract the content of something like this:

FOO[BAR[CAT[123]]+DOG[FOO]]

The Regex won't work and will return BAR[CAT[123 and FOO.
If we change the Regex to \[(.*)\] then we're OK but then, if you're trying to extract the content from more complex things like:

FOO[BAR[CAT[123]]+DOG[FOO]] = myOtherFoo[BAR[5]]

None of the Regexes will work.

The most accurate Regex to extract the proper content in all cases would be a lot more complex as it would need to balance [] pairs and give you they content.

A simpler solution

If your problems is getting complex and the content of the [] arbitrary, you could instead balance the pairs of [] and extract the string using plain old code rathe than a Regex:

int i;
int brackets = 0;
string c;
result = "";
for (i = input.indexOf("["); i < str.length; i++) {
    c = str.substring(i, i + 1);
    if (c == '[') {
        brackets++;
    } else if (c == ']') {
        brackets--;
        if (brackets <= 0) 
            break;
    }
    result = result + c;
}   

This is more pseudo-code than real code, I'm not a Java coder so I don't know if the syntax is correct, but it should be easy enough to improve upon.
What count is that this code should work and allow you to extract the content of the [], however complex it is.