Split a string by spaces -- preserving quoted substrings -- in Python

You want split, from the built-in shlex module.

>>> import shlex
>>> shlex.split('this is "a test"')
['this', 'is', 'a test']

This should do exactly what you want.

Have a look at the shlex module, particularly shlex.split.

>>> import shlex
>>> shlex.split('This is "a test"')
['This', 'is', 'a test']

I see regex approaches here that look complex and/or wrong. This surprises me, because regex syntax can easily describe "whitespace or thing-surrounded-by-quotes", and most regex engines (including Python's) can split on a regex. So if you're going to use regexes, why not just say exactly what you mean?:

test = 'this is "a test"'  # or "this is 'a test'"
# pieces = [p for p in re.split("( |[\\\"'].*[\\\"'])", test) if p.strip()]
# From comments, use this:
pieces = [p for p in re.split("( |\\\".*?\\\"|'.*?')", test) if p.strip()]

Explanation:

[\\\"'] = double-quote or single-quote
.* = anything
( |X) = space or X
.strip() = remove space and empty-string separators

shlex probably provides more features, though.

Depending on your use case, you may also want to check out the csv module:

import csv
lines = ['this is "a string"', 'and more "stuff"']
for row in csv.reader(lines, delimiter=" "):
    print(row)

Output:

['this', 'is', 'a string']
['and', 'more', 'stuff']

I use shlex.split to process 70,000,000 lines of squid log, it's so slow. So I switched to re.

Please try this, if you have performance problem with shlex.

import re

def line_split(line):
    return re.findall(r'[^"\s]\S*|".+?"', line)

Split a string by spaces -- preserving quoted substrings -- in Python

Related

Recent Posts