Split the sentences by ',' and remove surrounding spaces

I have this code:

var r = /(?:^\s*([^\s]*)\s*)(?:,\s*([^\s]*)\s*){0,}$/
var s = "   a   ,  b  , c "
var m = s.match(r)
m => ["   a   ,  b  , c ", "a", "c"]

Looks like the whole string has been matched, but where has "b" gone? I would rather expect to get:

["   a   ,  b  , c ", "a", "b", "c"]

so that I can do m.shift() with a result like s.split(',') but also with whitespaces removed.

Do I have a mistake in the regexp or do I misunderstand String.prototype.match?


Solution 1:

Here's a pretty simple & straightforward way to do this without needing a complex regular expression.

var str = "   a   ,  b  , c "
var arr = str.split(",").map(function(item) {
  return item.trim();
});
//arr = ["a", "b", "c"]

The native .map is supported on IE9 and up: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map


Or in ES6+ it gets even shorter:

var arr = str.split(",").map(item => item.trim());

And for completion, here it is in Typescript with typing information

var arr: string[] = str.split(",").map((item: string) => item.trim());

Solution 2:

ES6 shorthand:

str.split(',').map(item=>item.trim())

Solution 3:

You can try this without complex regular expressions.

var arr = "   a   ,  b  , c ".trim().split(/\s*,\s*/);
console.log(arr);

Solution 4:

Short answer: Use m = s.match(/[^ ,]/g);


Your RE doesn't work as expected, because the last group matches the most recent match (=c). If you omit {1,}$, the returned match will be " a , b ", "a", "b". In short, your RegExp does return as much matches as specified groups unless you use a global flag /g. In this case, the returned list hold references to all matched substrings.

To achieve your effect, use:

m = s.replace(/\s*(,|^|$)\s*/g, "$1");

This replace replaces every comma (,), beginning (^) and end ($), surrounded by whitespace, by the original character (comma, or nothing).

If you want to get an array, use:

m = s.replace(/^\s+|\s+$/g,"").split(/\s*,\s*/);

This RE trims the string (removes all whitespace at the beginning and end, then splits the string by <any whitespace>,<any whitespace>. Note that white-space characters also include newlines and tabs. If you want to stick to spaces-only, use a space () instead of \s.

Solution 5:

You can do this for your purpose
EDIT: Removing second replace as suggested in the comments. s.replace(/^\s*|\s*$/g,'').split(/\s*,\s*/)
First replace trims the string and then the split function splits around '\s*,\s*' . This gives output ["a", "b", "c"] on input " a , b , c "

As for why your regex is not capturing 'b', you are repeating a captured group, so only the last occurrence gets captured. More on that here http://www.regular-expressions.info/captureall.html