Split a string by spaces -- preserving quoted substrings -- in Python
You want split
, from the built-in shlex
module.
>>> import shlex
>>> shlex.split('this is "a test"')
['this', 'is', 'a test']
This should do exactly what you want.
Have a look at the shlex
module, particularly shlex.split
.
>>> import shlex
>>> shlex.split('This is "a test"')
['This', 'is', 'a test']
I see regex approaches here that look complex and/or wrong. This surprises me, because regex syntax can easily describe "whitespace or thing-surrounded-by-quotes", and most regex engines (including Python's) can split on a regex. So if you're going to use regexes, why not just say exactly what you mean?:
test = 'this is "a test"' # or "this is 'a test'"
# pieces = [p for p in re.split("( |[\\\"'].*[\\\"'])", test) if p.strip()]
# From comments, use this:
pieces = [p for p in re.split("( |\\\".*?\\\"|'.*?')", test) if p.strip()]
Explanation:
[\\\"'] = double-quote or single-quote
.* = anything
( |X) = space or X
.strip() = remove space and empty-string separators
shlex probably provides more features, though.
Depending on your use case, you may also want to check out the csv
module:
import csv
lines = ['this is "a string"', 'and more "stuff"']
for row in csv.reader(lines, delimiter=" "):
print(row)
Output:
['this', 'is', 'a string']
['and', 'more', 'stuff']
I use shlex.split to process 70,000,000 lines of squid log, it's so slow. So I switched to re.
Please try this, if you have performance problem with shlex.
import re
def line_split(line):
return re.findall(r'[^"\s]\S*|".+?"', line)