How can I tell if a string repeats itself in Python?

Solution 1:

Here's a concise solution which avoids regular expressions and slow in-Python loops:

def principal_period(s):
    i = (s+s).find(s, 1, -1)
    return None if i == -1 else s[:i]

See the Community Wiki answer started by @davidism for benchmark results. In summary,

David Zhang's solution is the clear winner, outperforming all others by at least 5x for the large example set.

(That answer's words, not mine.)

This is based on the observation that a string is periodic if and only if it is equal to a nontrivial rotation of itself. Kudos to @AleksiTorhamo for realizing that we can then recover the principal period from the index of the first occurrence of s in (s+s)[1:-1], and for informing me of the optional start and end arguments of Python's string.find.

Solution 2:

Here's a solution using regular expressions.

import re

REPEATER = re.compile(r"(.+?)\1+$")

def repeated(s):
    match = REPEATER.match(s)
    return match.group(1) if match else None

Iterating over the examples in the question:

examples = [
    '0045662100456621004566210045662100456621',
    '0072992700729927007299270072992700729927',
    '001443001443001443001443001443001443001443',
    '037037037037037037037037037037037037037037037',
    '047619047619047619047619047619047619047619',
    '002457002457002457002457002457002457002457',
    '001221001221001221001221001221001221001221',
    '001230012300123001230012300123001230012300123',
    '0013947001394700139470013947001394700139470013947',
    '001001001001001001001001001001001001001001001001001',
    '001406469760900140646976090014064697609',
    '004608294930875576036866359447',
    '00469483568075117370892018779342723',
    '004739336492890995260663507109',
    '001508295625942684766214177978883861236802413273',
    '007518796992481203',
    '0071942446043165467625899280575539568345323741',
    '0434782608695652173913',
    '0344827586206896551724137931',
    '002481389578163771712158808933',
    '002932551319648093841642228739',
    '0035587188612099644128113879',
    '003484320557491289198606271777',
    '00115074798619102416570771',
]

for e in examples:
    sub = repeated(e)
    if sub:
        print("%r: %r" % (e, sub))
    else:
        print("%r does not repeat." % e)

... produces this output:

'0045662100456621004566210045662100456621': '00456621'
'0072992700729927007299270072992700729927': '00729927'
'001443001443001443001443001443001443001443': '001443'
'037037037037037037037037037037037037037037037': '037'
'047619047619047619047619047619047619047619': '047619'
'002457002457002457002457002457002457002457': '002457'
'001221001221001221001221001221001221001221': '001221'
'001230012300123001230012300123001230012300123': '00123'
'0013947001394700139470013947001394700139470013947': '0013947'
'001001001001001001001001001001001001001001001001001': '001'
'001406469760900140646976090014064697609': '0014064697609'
'004608294930875576036866359447' does not repeat.
'00469483568075117370892018779342723' does not repeat.
'004739336492890995260663507109' does not repeat.
'001508295625942684766214177978883861236802413273' does not repeat.
'007518796992481203' does not repeat.
'0071942446043165467625899280575539568345323741' does not repeat.
'0434782608695652173913' does not repeat.
'0344827586206896551724137931' does not repeat.
'002481389578163771712158808933' does not repeat.
'002932551319648093841642228739' does not repeat.
'0035587188612099644128113879' does not repeat.
'003484320557491289198606271777' does not repeat.
'00115074798619102416570771' does not repeat.

The regular expression (.+?)\1+$ is divided into three parts:

  1. (.+?) is a matching group containing at least one (but as few as possible) of any character (because +? is non-greedy).

  2. \1+ checks for at least one repetition of the matching group in the first part.

  3. $ checks for the end of the string, to ensure that there's no extra, non-repeating content after the repeated substrings (and using re.match() ensures that there's no non-repeating text before the repeated substrings).

In Python 3.4 and later, you could drop the $ and use re.fullmatch() instead, or (in any Python at least as far back as 2.3) go the other way and use re.search() with the regex ^(.+?)\1+$, all of which are more down to personal taste than anything else.