Select multiple columns by labels in pandas
Name- or Label-Based (using regular expression syntax)
df.filter(regex='[A-CEG-I]') # does NOT depend on the column order
Note that any regular expression is allowed here, so this approach can be very general. E.g. if you wanted all columns starting with a capital or lowercase "A" you could use: df.filter(regex='^[Aa]')
Location-Based (depends on column order)
df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]
Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B']
, then you could replace 'A':'C'
above with 'A':'B'
.
The Long Way
And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:
df[['A','B','C','E','G','H','I']] # does NOT depend on the column order
Results for any of the above methods
A B C E G H I
0 -0.814688 -1.060864 -0.008088 2.697203 -0.763874 1.793213 -0.019520
1 0.549824 0.269340 0.405570 -0.406695 -0.536304 -1.231051 0.058018
2 0.879230 -0.666814 1.305835 0.167621 -1.100355 0.391133 0.317467
Just pick the columns you want directly....
df[['A','E','I','C']]
How do I select multiple columns by labels in pandas?
Multiple label-based range slicing is not easily supported with pandas, but position-based slicing is, so let's try that instead:
loc = df.columns.get_loc
df.iloc[:, np.r_[loc('A'):loc('C')+1, loc('E'), loc('G'):loc('I')+1]]
A B C E G H I
0 -1.666330 0.321260 -1.768185 -0.034774 0.023294 0.533451 -0.241990
1 0.911498 3.408758 0.419618 -0.462590 0.739092 1.103940 0.116119
2 1.243001 -0.867370 1.058194 0.314196 0.887469 0.471137 -1.361059
3 -0.525165 0.676371 0.325831 -1.152202 0.606079 1.002880 2.032663
4 0.706609 -0.424726 0.308808 1.994626 0.626522 -0.033057 1.725315
5 0.879802 -1.961398 0.131694 -0.931951 -0.242822 -1.056038 0.550346
6 0.199072 0.969283 0.347008 -2.611489 0.282920 -0.334618 0.243583
7 1.234059 1.000687 0.863572 0.412544 0.569687 -0.684413 -0.357968
8 -0.299185 0.566009 -0.859453 -0.564557 -0.562524 0.233489 -0.039145
9 0.937637 -2.171174 -1.940916 -1.553634 0.619965 -0.664284 -0.151388
Note that the +1
is added because when using iloc
the rightmost index is exclusive.
Comments on Other Solutions
-
filter
is a nice and simple method for OP's headers, but this might not generalise well to arbitrary column names. -
The "location-based" solution with
loc
is a little closer to the ideal, but you cannot avoid creating intermediate DataFrames (that are eventually thrown out and garbage collected) to compute the final result range -- something that we would ideally like to avoid. -
Lastly, "pick your columns directly" is good advice as long as you have a manageably small number of columns to pick. It will, however not be applicable in some cases where ranges span dozens (or possibly hundreds) of columns.