Regular expression pattern to match URL with or without http://www

For matching all kinds of URLs, the following code should work:

<?php
    $regex = "((https?|ftp)://)?"; // SCHEME
    $regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; // User and Pass
    $regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP
    $regex .= "(:[0-9]{2,5})?"; // Port
    $regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path
    $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+/$_.-]*)?"; // GET Query
    $regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor
?>

Then, the correct way to check against the regex is as follows:

<?php
   if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))
      var_dump($m);

   if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))
      var_dump($m);
?>

Courtesy: Comments made by splattermania in the PHP manual: http://php.net/manual/en/function.preg-match.php

RegEx Demo in regex101


This worked for me in all cases I had tested:

$url_pattern = '/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:@\-_=#])*/';

Tests:

http://test.test-75.1474.stackoverflow.com/
https://www.stackoverflow.com
https://www.stackoverflow.com/
http://wwww.stackoverflow.com/
http://wwww.stackoverflow.com


http://test.test-75.1474.stackoverflow.com/
http://www.stackoverflow.com
http://www.stackoverflow.com/
stackoverflow.com/
stackoverflow.com

http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:[email protected]/etcetc

example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds

http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www/

Every valid Internet URL has at least one dot, so the above pattern will simply try to find any at least two strings chained by a dot and has valid characters that URL may have.


Try this:

/^http:\/\/|(www\.)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/

It works exactly like the people want.

It takes with or with out http://, https://, and www.


You can use a question mark after a regular expression to make it conditional so you would want to use:

http:\/\/(www\.)?

That will match anything that has either http://www. or http:// (with no www.)

You could just use a replace method to remove the above, thus getting you the domain. It depends on what you need the domain for.