Regular expression to remove one parameter from query string
I'm looking for a regular expression to remove a single parameter from a query string, and I want to do it in a single regular expression if possible.
Say I want to remove the foo
parameter. Right now I use this:
/&?foo\=[^&]+/
That works as long as foo
is not the first parameter in the query string. If it is, then my new query string starts with an ampersand. (For example, "foo=123&bar=456
" gives a result of "&bar=456
".) Right now, I'm just checking after the regex if the query string starts with ampersand, and chopping it off if it does.
Example edge cases:
Input | Expected Output
-------------------------+--------------------
foo=123 | (empty string)
foo=123&bar=456 | bar=456
bar=456&foo=123 | bar=456
abc=789&foo=123&bar=456 | abc=789&bar=456
Edit
OK as pointed out in comments there are there are way more edge cases than I originally considered. I got the following regex to work with all of them:
/&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/
This is modified from Mark Byers's answer, which is why I'm accepting that one, but Roger Pate's input helped a lot too.
Here is the full suite of test cases I'm using, and a Javascript snippet which tests them:
$(function() {
var regex = /&foo(\=[^&]*)?(?=&|$)|^foo(\=[^&]*)?(&|$)/;
var escapeHtml = function (str) {
var map = {
'&': '&',
'<': '<',
'>': '>',
'"': '"',
"'": '''
};
return str.replace(/[&<>"']/g, function(m) { return map[m]; });
};
//test cases
var tests = [
'foo' , 'foo&bar=456' , 'bar=456&foo' , 'abc=789&foo&bar=456'
,'foo=' , 'foo=&bar=456' , 'bar=456&foo=' , 'abc=789&foo=&bar=456'
,'foo=123' , 'foo=123&bar=456' , 'bar=456&foo=123' , 'abc=789&foo=123&bar=456'
,'xfoo' , 'xfoo&bar=456' , 'bar=456&xfoo' , 'abc=789&xfoo&bar=456'
,'xfoo=' , 'xfoo=&bar=456' , 'bar=456&xfoo=' , 'abc=789&xfoo=&bar=456'
,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
,'foox' , 'foox&bar=456' , 'bar=456&foox' , 'abc=789&foox&bar=456'
,'foox=' , 'foox=&bar=456' , 'bar=456&foox=' , 'abc=789&foox=&bar=456'
,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
];
//expected results
var expected = [
'' , 'bar=456' , 'bar=456' , 'abc=789&bar=456'
,'' , 'bar=456' , 'bar=456' , 'abc=789&bar=456'
,'' , 'bar=456' , 'bar=456' , 'abc=789&bar=456'
,'xfoo' , 'xfoo&bar=456' , 'bar=456&xfoo' , 'abc=789&xfoo&bar=456'
,'xfoo=' , 'xfoo=&bar=456' , 'bar=456&xfoo=' , 'abc=789&xfoo=&bar=456'
,'xfoo=123', 'xfoo=123&bar=456', 'bar=456&xfoo=123', 'abc=789&xfoo=123&bar=456'
,'foox' , 'foox&bar=456' , 'bar=456&foox' , 'abc=789&foox&bar=456'
,'foox=' , 'foox=&bar=456' , 'bar=456&foox=' , 'abc=789&foox=&bar=456'
,'foox=123', 'foox=123&bar=456', 'bar=456&foox=123', 'abc=789&foox=123&bar=456'
];
for(var i = 0; i < tests.length; i++) {
var output = tests[i].replace(regex, '');
var success = (output == expected[i]);
$('#output').append(
'<tr class="' + (success ? 'passed' : 'failed') + '">'
+ '<td>' + (success ? 'PASS' : 'FAIL') + '</td>'
+ '<td>' + escapeHtml(tests[i]) + '</td>'
+ '<td>' + escapeHtml(output) + '</td>'
+ '<td>' + escapeHtml(expected[i]) + '</td>'
+ '</tr>'
);
}
});
#output {
border-collapse: collapse;
}
#output tr.passed { background-color: #af8; }
#output tr.failed { background-color: #fc8; }
#output td, #output th {
border: 1px solid black;
padding: 2px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table id="output">
<tr>
<th>Succ?</th>
<th>Input</th>
<th>Output</th>
<th>Expected</th>
</tr>
</table>
Solution 1:
If you want to do this in just one regular expression, you could do this:
/&foo(=[^&]*)?|^foo(=[^&]*)?&?/
This is because you need to match either an ampersand before the foo=..., or one after, or neither, but not both.
To be honest, I think it's better the way you did it: removing the trailing ampersand in a separate step.
Solution 2:
/(?<=&|\?)foo(=[^&]*)?(&|$)/
Uses lookbehind and the last group to "anchor" the match, and allows a missing value. Change the \?
to ^
if you've already stripped off the question mark from the query string.
Regex is still not a substitute for a real parser of the query string, however.
Update: Test script: (run it at codepad.org)
import re
regex = r"(^|(?<=&))foo(=[^&]*)?(&|$)"
cases = {
"foo=123": "",
"foo=123&bar=456": "bar=456",
"bar=456&foo=123": "bar=456",
"abc=789&foo=123&bar=456": "abc=789&bar=456",
"oopsfoo=123": "oopsfoo=123",
"oopsfoo=123&bar=456": "oopsfoo=123&bar=456",
"bar=456&oopsfoo=123": "bar=456&oopsfoo=123",
"abc=789&oopsfoo=123&bar=456": "abc=789&oopsfoo=123&bar=456",
"foo": "",
"foo&bar=456": "bar=456",
"bar=456&foo": "bar=456",
"abc=789&foo&bar=456": "abc=789&bar=456",
"foo=": "",
"foo=&bar=456": "bar=456",
"bar=456&foo=": "bar=456",
"abc=789&foo=&bar=456": "abc=789&bar=456",
}
failures = 0
for input, expected in cases.items():
got = re.sub(regex, "", input)
if got != expected:
print "failed: input=%r expected=%r got=%r" % (input, expected, got)
failures += 1
if not failures:
print "Success"
It shows where my approach failed, Mark has the right of it—which should show why you shouldn't do this with regex.. :P
The problem is associating the query parameter with exactly one ampersand, and—if you must use regex (if you haven't picked up on it :P, I'd use a separate parser, which might use regex inside it, but still actually understand the format)—one solution would be to make sure there's exactly one ampersand per parameter: replace the leading ?
with a &
.
This gives /&foo(=[^&]*)?(?=&|$)/
, which is very straight forward and the best you're going to get. Remove the leading &
in the final result (or change it back into a ?
, etc.). Modifying the test case to do this uses the same cases as above, and changes the loop to:
failures = 0
for input, expected in cases.items():
input = "&" + input
got = re.sub(regex, "", input)
if got[:1] == "&":
got = got[1:]
if got != expected:
print "failed: input=%r expected=%r got=%r" % (input, expected, got)
failures += 1
if not failures:
print "Success"
Solution 3:
Having a query string that starts with &
is harmless--why not leave it that way? In any case, I suggest that you search for the trailing ampersand and use \b
to match the beginning of foo w/o taking in a previous character:
/\bfoo\=[^&]+&?/