How do I reliably split a string in Python, when it may not contain the pattern, or all n elements?

In Perl I can do:

my ($x, $y) = split /:/, $str;

And it will work whether or not the string contains the pattern.

In Python, however this won't work:

a, b = "foo".split(":")  # ValueError: not enough values to unpack

What's the canonical way to prevent errors in such cases?


Solution 1:

If you're splitting into just two parts (like in your example) you can use str.partition() to get a guaranteed argument unpacking size of 3:

>>> a, sep, b = 'foo'.partition(':')
>>> a, sep, b
('foo', '', '')

str.partition() always returns a 3-tuple, whether the separator is found or not.

Another alternative for Python 3.x is to use extended iterable unpacking:

>>> a, *b = 'foo'.split(':')
>>> a, b
('foo', [])

This assigns the first split item to a and the list of remaining items (if any) to b.

Solution 2:

Since you are on Python 3, it is easy. PEP 3132 introduced a welcome simplification of the syntax when assigning to tuples - Extended iterable unpacking. In the past, if assigning to variables in a tuple, the number of items on the left of the assignment must be exactly equal to that on the right.

In Python 3 we can designate any variable on the left as a list by prefixing with an asterisk *. That will grab as many values as it can, while still populating the variables to its right (so it need not be the rightmost item). This avoids many nasty slices when we don't know the length of a tuple.

a, *b = "foo".split(":")  
print("a:", a, "b:", b)

Gives:

a: foo b: []

EDIT following comments and discussion:

In comparison to the Perl version, this is considerably different, but it is the Python (3) way. In comparison with the Perl version, re.split() would be more similar, however invoking the RE engine for splitting around a single character is an unnecessary overhead.

With multiple elements in Python:

s = 'hello:world:sailor'
a, *b = s.split(":")
print("a:", a, "b:", b)

gives:

a: hello b: ['world', 'sailor']

However in Perl:

my $s = 'hello:world:sailor';
my ($a, $b) = split /:/, $s;
print "a: $a b: $b\n";

gives:

a: hello b: world

It can be seen that additional elements are ignored, or lost, in Perl. That is fairly easy to replicate in Python if required:

s = 'hello:world:sailor'
a, *b = s.split(":")
b = b[0]
print("a:", a, "b:", b)

So, a, *b = s.split(":") equivalent in Perl would be

my ($a, @b) = split /:/, $s;

NB: we shouldn't use $a and $b in general Perl since they have a special meaning when used with sort. I have used them here for consistency with the Python example.

Python does have an extra trick up its sleeve, we can unpack to any element in the tuple on the left:

s = "one:two:three:four"
a, *b, c = s.split(':')
print("a:", a, "b:", b, "c:", c)

Gives:

a: one b: ['two', 'three'] c: four

Whereas in the Perl equivalent, the array (@b) is greedy, and the scalar $c is undef:

use strict;
use warnings;

my $s = 'one:two:three:four';
my ($a, @b, $c) = split /:/, $s;
print "a: $a b: @b c: $c\n";

Gives:

Use of uninitialized value $c in concatenation (.) or string at gash.pl line 8.
a: one b: two three four c: 

Solution 3:

You are always free to catch the exception.

For example:

some_string = "foo"

try:
    a, b = some_string.split(":")
except ValueError:
    a = some_string
    b = ""

If assigning the whole original string to a and an empty string to b is the desired behaviour, I would probably use str.partition() as eugene y suggests. However, this solution gives you more control over exactly what happens when there is no separator in the string, which might be useful in some cases.