Regular expression to remove one parameter from query string

I'm looking for a regular expression to remove a single parameter from a query string, and I want to do it in a single regular expression if possible.

Say I want to remove the foo parameter. Right now I use this:

/&?foo\=[^&]+/

That works as long as foo is not the first parameter in the query string. If it is, then my new query string starts with an ampersand. (For example, "foo=123&bar=456" gives a result of "&bar=456".) Right now, I'm just checking after the regex if the query string starts with ampersand, and chopping it off if it does.

Example edge cases:

Input                    |  Expected Output
-------------------------+--------------------
foo=123                  |  (empty string)
foo=123&bar=456          |  bar=456
bar=456&foo=123          |  bar=456
abc=789&foo=123&bar=456  |  abc=789&bar=456

Edit

OK as pointed out in comments there are there are way more edge cases than I originally considered. I got the following regex to work with all of them:

/&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/

This is modified from Mark Byers's answer, which is why I'm accepting that one, but Roger Pate's input helped a lot too.

Here is the full suite of test cases I'm using, and a Javascript snippet which tests them:

$(function() {
    var regex = /&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/;
    
    var escapeHtml = function (str) {
        var map = {
          '&': '&',
          '<': '&lt;',
          '>': '&gt;',
          '"': '&quot;',
          "'": '&#039;'
        };
        
        return str.replace(/[&<>"']/g, function(m) { return map[m]; });
    };

    
    //test cases
    var tests = [
        'foo'     , 'foo&bar=456'     , 'bar=456&foo'     , 'abc=789&foo&bar=456'
       ,'foo='    , 'foo=&bar=456'    , 'bar=456&foo='    , 'abc=789&foo=&bar=456'
       ,'foo=123' , 'foo=123&bar=456' , 'bar=456&foo=123' , 'abc=789&foo=123&bar=456'
       ,'xfoo'    , 'xfoo&bar=456'    , 'bar=456&xfoo'    , 'abc=789&xfoo&bar=456'
       ,'xfoo='   , 'xfoo=&bar=456'   , 'bar=456&xfoo='   , 'abc=789&xfoo=&bar=456'
       ,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
       ,'foox'    , 'foox&bar=456'    , 'bar=456&foox'    , 'abc=789&foox&bar=456'
       ,'foox='   , 'foox=&bar=456'   , 'bar=456&foox='   , 'abc=789&foox=&bar=456'
       ,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
    ];
    
    //expected results
    var expected = [
        ''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,''        , 'bar=456'         , 'bar=456'         , 'abc=789&bar=456'
       ,'xfoo'    , 'xfoo&bar=456'    , 'bar=456&xfoo'    , 'abc=789&xfoo&bar=456'
       ,'xfoo='   , 'xfoo=&bar=456'   , 'bar=456&xfoo='   , 'abc=789&xfoo=&bar=456'
       ,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
       ,'foox'    , 'foox&bar=456'    , 'bar=456&foox'    , 'abc=789&foox&bar=456'
       ,'foox='   , 'foox=&bar=456'   , 'bar=456&foox='   , 'abc=789&foox=&bar=456'
       ,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
    ];
    
    for(var i = 0; i < tests.length; i++) {
        var output = tests[i].replace(regex, '');
        var success = (output == expected[i]);
        
        $('#output').append(
            '<tr class="' + (success ? 'passed' : 'failed') + '">'
            + '<td>' + (success ? 'PASS' : 'FAIL') + '</td>'
            + '<td>' + escapeHtml(tests[i]) + '</td>'
            + '<td>' + escapeHtml(output) + '</td>'
            + '<td>' + escapeHtml(expected[i]) + '</td>'
            + '</tr>'
        );
    }
    
});
#output {
    border-collapse: collapse;
    
}
#output tr.passed { background-color: #af8; }
#output tr.failed { background-color: #fc8; }
#output td, #output th {
    border: 1px solid black;
    padding: 2px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table id="output">
    <tr>
        <th>Succ?</th>
        <th>Input</th>
        <th>Output</th>
        <th>Expected</th>
    </tr>
</table>

Solution 1:

If you want to do this in just one regular expression, you could do this:

/&foo(=[^&]*)?|^foo(=[^&]*)?&?/

This is because you need to match either an ampersand before the foo=..., or one after, or neither, but not both.

To be honest, I think it's better the way you did it: removing the trailing ampersand in a separate step.

Solution 2:

/(?<=&|\?)foo(=[^&]*)?(&|$)/

Uses lookbehind and the last group to "anchor" the match, and allows a missing value. Change the \? to ^ if you've already stripped off the question mark from the query string.

Regex is still not a substitute for a real parser of the query string, however.

Update: Test script: (run it at codepad.org)

import re

regex = r"(^|(?<=&))foo(=[^&]*)?(&|$)"

cases = {
  "foo=123": "",
  "foo=123&bar=456": "bar=456",
  "bar=456&foo=123": "bar=456",
  "abc=789&foo=123&bar=456": "abc=789&bar=456",

  "oopsfoo=123": "oopsfoo=123",
  "oopsfoo=123&bar=456": "oopsfoo=123&bar=456",
  "bar=456&oopsfoo=123": "bar=456&oopsfoo=123",
  "abc=789&oopsfoo=123&bar=456": "abc=789&oopsfoo=123&bar=456",

  "foo": "",
  "foo&bar=456": "bar=456",
  "bar=456&foo": "bar=456",
  "abc=789&foo&bar=456": "abc=789&bar=456",

  "foo=": "",
  "foo=&bar=456": "bar=456",
  "bar=456&foo=": "bar=456",
  "abc=789&foo=&bar=456": "abc=789&bar=456",
}

failures = 0
for input, expected in cases.items():
  got = re.sub(regex, "", input)
  if got != expected:
    print "failed: input=%r expected=%r got=%r" % (input, expected, got)
    failures += 1
if not failures:
  print "Success"

It shows where my approach failed, Mark has the right of it—which should show why you shouldn't do this with regex.. :P


The problem is associating the query parameter with exactly one ampersand, and—if you must use regex (if you haven't picked up on it :P, I'd use a separate parser, which might use regex inside it, but still actually understand the format)—one solution would be to make sure there's exactly one ampersand per parameter: replace the leading ? with a &.

This gives /&foo(=[^&]*)?(?=&|$)/, which is very straight forward and the best you're going to get. Remove the leading & in the final result (or change it back into a ?, etc.). Modifying the test case to do this uses the same cases as above, and changes the loop to:

failures = 0
for input, expected in cases.items():
  input = "&" + input
  got = re.sub(regex, "", input)
  if got[:1] == "&":
    got = got[1:]
  if got != expected:
    print "failed: input=%r expected=%r got=%r" % (input, expected, got)
    failures += 1
if not failures:
  print "Success"

Solution 3:

Having a query string that starts with & is harmless--why not leave it that way? In any case, I suggest that you search for the trailing ampersand and use \b to match the beginning of foo w/o taking in a previous character:

 /\bfoo\=[^&]+&?/