Regular expression puzzler

Solution 1:

The quantifier {3} in the pattern [iny]{3} means to match a character with that pattern (either i or n or y), and then another character with the same pattern, and then another. Three -- one after another. So your string unify doesn't have that, but can muster two at most, ni.

That's been explained in other answers already. What I'd like to add is an answer to a clarification in comments: how to check for these characters appearing 3 times in the string, scattered around at will. Apart from matching that whole substring, as shown already, we can use a lookahead:

(?=[iny].*[iny].*[iny])

This does not "consume" any characters but rather "looks" ahead for the pattern, not advancing the engine from its current position. As such it can be very useful as a subpattern, in combination with other patterns in a larger regex.

A Perl example, to copy-paste on the command line:

perl -wE'say "Match" if "unify" =~ /(?=[iny].*[iny].*[iny])/'

The drawback to this, as well as to consuming the whole such substring, is the literal spelling out of all three subpatterns; what when the number need be decided dynamically? Or when it's twelve? The pattern can be built at runtime of course. In Perl, one way

my $pattern = '(?=' . join('.*', ('[iny]')x3) . ')';

and then use that in the regex.

For the sake of performance, for long strings and many repetitions, make that .* non-greedy

(?=[iny].*?[iny].*?[iny])

(when forming the pattern dynamically join with .*?)

A simple benchmark for illustration (in Perl)

use warnings;
use strict;
use feature 'say';

use Getopt::Long;
use List::Util qw(shuffle);
use Benchmark qw( cmpthese );

# For how many seconds to run each option (-r N, default 3), 
# how many times to repeat for the test string (-n N, default 2)
my ($runfor, $n) = (3, 2);
GetOptions('r:i' => \$runfor, 'n:i' => \$n);

my $str = 'aa'
    . join('', map { (shuffle 'b'..'t')x$n, 'a' } 1..$n)
    . 'a'x($n+1) 
    . 'zzz'; 
    
my $pat_greedy     = '(?=' . join('.*',  ('a')x$n) . ')';
my $pat_non_greedy = '(?=' . join('.*?', ('a')x$n) . ')';
#my $pat_greedy     = join('.*',  ('a')x$n);  # test straight match,
#my $pat_non_greedy = join('.*?', ('a')x$n);  # not lookahead

sub match_repeated {
    my ($s, $pla) = @_;
    return ( $s =~ /$pla(.*z)/ ) ? "match" : "no match";
}   

cmpthese(-$runfor, {
    greedy     => sub { match_repeated($str, $pat_greedy) },
    non_greedy => sub { match_repeated($str, $pat_non_greedy) },
});

(Shuffling of that string is probably unneeded but I feared optimizations intruding.)

When a string is made with the factor of 20 (program.pl -n 20) the output is

              Rate     greedy non_greedy
greedy      56.3/s         --      -100%
non_greedy 90169/s    159926%         --

So ... some 1600 times better non-greedy. That test string is 7646 characters long and the pattern to match has 20 subpatterns (a) with .* between them (in greedy case); so there's a lot going on there. With default 2, so for a short string and a simpler pattern, the difference is 10%.

Btw, to test for straight-up matches (not using lookahead) just move those comment signs around the pattern variables, and it's nearly twice as bad:

               Rate     greedy non_greedy
greedy       56.5/s         --      -100%
non_greedy 171949/s    304117%         --

Solution 2:

The letters n, i, and y aren't all adjacent. There's an f in between them.

/[iny]{3}/ matches any string that contains a substring of three letters taken from the set {i, n, y}. The letters can be in any order; they can even be repeated.

Choosing three characters three times, with replacement, means there are 3³ = 27 matching substrings:

iii, iin, iiy, ini, inn, iny, iyi, iyn, iyy
nii, nin, niy, nni, nnn, nny, nyi, nyn, nyy
yii, yin, yiy, yni, ynn, yny, yyi, yyn, yyy

To match non-adjacent letters you can use one of these:

```
[iny].*[iny].*[iny]
```
```
[iny](.*[iny]){2}
```
```
([iny].*){3}
```

(The last option will work fine on its own since your search is unanchored, but might not be suitable as part of a larger regex. The final .* could match more than you intend.)

Regular expression puzzler

Solution 1:

Solution 2:

Related

Recent Posts