how to replace all occurrences of a string inside tags <> using regex in java

I have a string inside tags. Eg: "<abc~a>I am src scr customer<abc~b>"

I want to replace "src" with "abc". I used following regex to replace:- replacAll("(<abc~a>.?)src(.?<abc~b>)"),"$1"+"abc"+"$2"); But it is replacing only first occurrence of string i.e. output is "<abc~a>I am abc src customer<abc~b>"

I want output as "<abc~a>I am abc abc customer<abc~b>".

I don't want to use matcher pattern. Is there any solution using replaceAll() ? Please help.


Solution 1:

We can try using a formal regex pattern matcher here. Match on the pattern <abc~a>(.*?)<abc~a>, and for each match append the tag with src replaced by abc. Here is a sample code:

String input = "Here is a src <abc~a>I am an src customer<abc~b> also another src here.";
Pattern p = Pattern.compile("<abc~a>(.*?)<abc~b>");
Matcher m = p.matcher(input);
StringBuffer buffer = new StringBuffer();
  
while(m.find()) {
    String replace = "<abc~a>" + m.group(1).replaceAll("\\bsrc\\b", "abc") + "<abc~b>";
    m.appendReplacement(buffer, replace);
}
m.appendTail(buffer);

System.out.println(buffer.toString());

This prints:

Here is a src <abc~a>I am an abc customer<abc~b> also another src here.

Note that in many other languages we could have used a regex callback function. But core Java does not support this functionality, so we have to iterate over the entire input.

Solution 2:

When you’re using Java 9 or newer, the simplest approach to your problem would be

Pattern p = Pattern.compile("(?<=<abc~a>).*?(?=<abc~b>)");
String result = p.matcher(input)
    .replaceAll(m -> m.group().replaceAll("\\bsrc\\b", "abc"));

Basically, it does the same as Tim Biegeleisen’s answer under the hood. Minor differences are that it will use StringBuilder instead of StringBuffer, an option only available since Java 9 and it will return the original string instance if no match for the outer pattern (p) has been found (rather than a copy).

I also changed the pattern to use look-behind and look-ahead, which simplifies the replace function and also reduces the amount of character copying.

Note that both replaceAll operations have a similar appendReplacement loop behind the scenes. The method appendReplacement will search for replacement patterns (e.g. $number) in the replacement string, which does not only apply to "abc" but also the entire group between the <abc~a> and <abc~b> tag. If you can’t preclude the presence of conflicting special characters, you have to use Matcher.quoteReplacement to avoid problems.


Besides the unwanted interpretation of replacement patterns, the inner replaceAll will compile the pattern string to a Pattern object on each invocation. Further, the inner operation creates a temporary string which is then used for the outer replacement operation, so this simple solution will copy some of the character contents multiple times.

If performance really matters, it’s worth writing a dedicated operation, even if it’s more verbose.

static final Pattern OUTER_PATTERN = Pattern.compile("<abc~a>(.*?)<abc~b>");
static final Pattern INNER_PATTERN = Pattern.compile("\\bsrc\\b");
String replacement = "abc";
String result;
Matcher m = OUTER_PATTERN.matcher(input);
if(!m.find()) result = input;
else {
    StringBuilder sb = new StringBuilder(input.length());
    int copyStart = 0, nextSearchStart;
    do {
        nextSearchStart = m.end();
        for(m.region(m.start(1), m.end(1)).usePattern(INNER_PATTERN);
                                           m.find(); copyStart = m.end()) {
            sb.append(input, copyStart, m.start()).append(replacement);
        }
    } while(m.region(nextSearchStart, input.length()).usePattern(OUTER_PATTERN).find());
    result = copyStart==0? input: sb.append(input, copyStart, input.length()).toString();
}

This does not compile the patterns multiple times and uses both patterns in a single replacement operation without intermediate steps, performing the minimum character copying necessary. The replacement string is copied literally using StringBuilder.append, so no quoting is necessary. Like the built-in replaceAll it will return the original string when no match of the outer pattern has been found. But it will also return the original string when the outer pattern had matches but there were no inner pattern matches within the affected region(s).