How to find and replace nth occurrence of word in a sentence using python regular expression?

Using python regular expression only, how to find and replace nth occurrence of word in a sentence? For example:

str = 'cat goose  mouse horse pig cat cow'
new_str = re.sub(r'cat', r'Bull', str)
new_str = re.sub(r'cat', r'Bull', str, 1)
new_str = re.sub(r'cat', r'Bull', str, 2)

I have a sentence above where the word 'cat' appears two times in the sentence. I want 2nd occurence of the 'cat' to be changed to 'Bull' leaving 1st 'cat' word untouched. My final sentence would look like: "cat goose mouse horse pig Bull cow". In my code above I tried 3 different times could not get what I wanted.

Solution 1:

Use negative lookahead like below.

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^((?:(?!cat).)*cat(?:(?!cat).)*)cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

DEMO

^ Asserts that we are at the start.
(?:(?!cat).)* Matches any character but not of cat , zero or more times.
cat matches the first cat substring.
(?:(?!cat).)* Matches any character but not of cat , zero or more times.
Now, enclose all the patterns inside a capturing group like ((?:(?!cat).)*cat(?:(?!cat).)*), so that we could refer those captured chars on later.
cat now the following second cat string is matched.

>>> s = "cat goose  mouse horse pig cat cow"
>>> re.sub(r'^(.*?(cat.*?){1})cat', r'\1Bull', s)
'cat goose  mouse horse pig Bull cow'

Change the number inside the {} to replace the first or second or nth occurrence of the string cat

To replace the third occurrence of the string cat, put 2 inside the curly braces ..

>>> re.sub(r'^(.*?(cat.*?){2})cat', r'\1Bull', "cat goose  mouse horse pig cat foo cat cow")
'cat goose  mouse horse pig cat foo Bull cow'

Play with the above regex on here ...

Solution 2:

I use simple function, which lists all occurrences, picks the nth one's position and uses it to split original string into two substrings. Then it replaces first occurrence in the second substring and joins substrings back into the new string:

import re

def replacenth(string, sub, wanted, n):
    where = [m.start() for m in re.finditer(sub, string)][n-1]
    before = string[:where]
    after = string[where:]
    newString = before + after.replace(sub, wanted, 1)
    print newString

For these variables:

string = 'ababababababababab'
sub = 'ab'
wanted = 'CD'
n = 5

outputs:

ababababCDabababab

Notes:

The where variable actually is a list of matches' positions, where you pick up the nth one. But list item index starts with 0 usually, not with 1. Therefore there is a n-1 index and n variable is the actual nth substring. My example finds 5th string. If you use n index and want to find 5th position, you'll need n to be 4. Which you use usually depends on the function, which generates our n.

This should be the simplest way, but it isn't regex only as you originally wanted.

Sources and some links in addition:

where construction: How to find all occurrences of a substring?

string splitting: https://www.daniweb.com/programming/software-development/threads/452362/replace-nth-occurrence-of-any-sub-string-in-a-string

similar question: Find the nth occurrence of substring in a string

Solution 3:

Here's a way to do it without a regex:

def replaceNth(s, source, target, n):
    inds = [i for i in range(len(s) - len(source)+1) if s[i:i+len(source)]==source]
    if len(inds) < n:
        return  # or maybe raise an error
    s = list(s)  # can't assign to string slices. So, let's listify
    s[inds[n-1]:inds[n-1]+len(source)] = target  # do n-1 because we start from the first occurrence of the string, not the 0-th
    return ''.join(s)

Usage:

In [278]: s
Out[278]: 'cat goose  mouse horse pig cat cow'

In [279]: replaceNth(s, 'cat', 'Bull', 2)
Out[279]: 'cat goose  mouse horse pig Bull cow'

In [280]: print(replaceNth(s, 'cat', 'Bull', 3))
None

Solution 4:

I would define a function that will work for every regex:

import re

def replace_ith_instance(string, pattern, new_str, i = None, pattern_flags = 0):
    # If i is None - replacing last occurrence
    match_obj = re.finditer(r'{0}'.format(pattern), string, flags = pattern_flags)
    matches = [item for item in match_obj]
    if i == None:
        i = len(matches)
    if len(matches) == 0 or len(matches) < i:
        return string
    match = matches[i - 1]
    match_start_index = match.start()
    match_len = len(match.group())

    return '{0}{1}{2}'.format(string[0:match_start_index], new_str, string[match_start_index + match_len:])

A working example:

str = 'cat goose  mouse horse pig cat cow'
ns = replace_ith_instance(str, 'cat', 'Bull', 2)
print(ns)

The output:

cat goose  mouse horse pig Bull cow

Another example:

str2 = 'abc abc def abc abc'
ns = replace_ith_instance(str2, 'abc\s*abc', '666')
print(ns)

The output:

abc abc def 666

Solution 5:

How to replace the nth needle with word:

s.replace(needle,'$$$',n-1).replace(needle,word,1).replace('$$$',needle)