How to mimic Stack Overflow Auto-Link Behavior

With PHP how can I mimic the auto-link behavior of Stack Overflow (which BTW is awesomely cool)?

For instance, the following URL:

http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Is converted into this:

<a title="how to mimic stackoverflow auto link behavior" rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">stackoverflow.com/questions/1925455/…</a>

I don't really care for the title attribute in this case.


And this:

http://pt.php.net/manual/en/function.base-convert.php#52450

Is converted into this:

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">pt.php.net/manual/en/…</a>

How can I make a similar function in PHP?

PS: Check my comments on this question for some more examples and behaviors.


Try this out. The URL-matching regex pattern is from Daring Fireball.

/**
 * Replace links in text with html links
 *
 * @param  string $text
 * @return string
 */
function auto_link_text($text)
{
   // a more readably-formatted version of the pattern is on http://daringfireball.net/2010/07/improved_regex_for_matching_urls
   $pattern  = '(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

   $callback = create_function('$matches', '
       $url       = array_shift($matches);
       $url_parts = parse_url($url);

       $text = parse_url($url, PHP_URL_HOST) . parse_url($url, PHP_URL_PATH);
       $text = preg_replace("/^www./", "", $text);

       $last = -(strlen(strrchr($text, "/"))) + 1;
       if ($last < 0) {
           $text = substr($text, 0, $last) . "&hellip;";
       }

       return sprintf(\'<a rel="nofollow" href="%s">%s</a>\', $url, $text);
   ');

   return preg_replace_callback($pattern, $callback, $text);
}

Input Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
 Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

 Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

Output Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
 Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">stackoverflow.com/questions/1925455/&hellip;</a>

 Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">pt.php.net/manual/en/&hellip;</a>

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450">pt.php.net/manual/en/&hellip;</a>

This is based on the same daringfireball.net regular expression, but adds a bit more logic than Eric Coleman's example, as well as configuration for maximum URL depth (SO seems to be 50), maximum path depth when URL is truncated (SO seems to be 2), and ellipsis character (&hellip;).

As far as I know this replicates all of the SO URL rewriting functionality, at least as far as what was discussed so far in the comments and responses here.

function auto_link_text($text) {
    $pattern  = '#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#';
    return preg_replace_callback($pattern, 'auto_link_text_callback', $text);
}

function auto_link_text_callback($matches) {
    $max_url_length = 50;
    $max_depth_if_over_length = 2;
    $ellipsis = '&hellip;';

    $url_full = $matches[0];
    $url_short = '';

    if (strlen($url_full) > $max_url_length) {
        $parts = parse_url($url_full);
        $url_short = $parts['scheme'] . '://' . preg_replace('/^www\./', '', $parts['host']) . '/';

        $path_components = explode('/', trim($parts['path'], '/'));
        foreach ($path_components as $dir) {
            $url_string_components[] = $dir . '/';
        }

        if (!empty($parts['query'])) {
            $url_string_components[] = '?' . $parts['query'];
        }

        if (!empty($parts['fragment'])) {
            $url_string_components[] = '#' . $parts['fragment'];
        }

        for ($k = 0; $k < count($url_string_components); $k++) {
            $curr_component = $url_string_components[$k];
            if ($k >= $max_depth_if_over_length || strlen($url_short) + strlen($curr_component) > $max_url_length) {
                if ($k == 0 && strlen($url_short) < $max_url_length) {
                    // Always show a portion of first directory
                    $url_short .= substr($curr_component, 0, $max_url_length - strlen($url_short));
                }
                $url_short .= $ellipsis;
                break;
            }
            $url_short .= $curr_component;
        }

    } else {
        $url_short = $url_full;
    }

    return "<a rel=\"nofollow\" href=\"$url_full\">$url_short</a>";
}

Sample Input:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

http://a.b/c/d/e/f/test

and http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test

Sample Output:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">http://stackoverflow.com/questions/1925455/&hellip;</a> 

Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">http://pt.php.net/manual/en/&hellip;</a> 

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450">http://pt.php.net/manual/en/&hellip;</a> 

<a rel="nofollow" href="http://a.b/c/d/e/f/test">http://a.b/c/d/e/f/test</a> 

and <a rel="nofollow" href="http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test">http://a.b/c/d/&hellip;</a>