Arrays of strings into numpy.amax

Solution 1:

Instead of storing your strings as variable length data in the numpy array, you could try storing them as Python objects instead. Numpy will treat these as references to the original Python string objects, and you can then treat them like you might expect:

t = np.array([['one','two','three'],['four','five','six']], dtype=object)
np.min(t)
# gives 'five'
np.max(t)
# gives 'two'

Keep in mind that here, the np.min and np.max calls are ordering the strings lexicographically - so "two" does indeed come after "five". To change the comparison operator to look at the length of each string, you could try creating a new numpy array identical in form, but containing each string's length instead of its reference. You could then do a numpy.argmin call on that array (which returns the index of the minimum) and look up the value of the string in the original array.


Example code:

# Vectorize takes a Python function and converts it into a Numpy
# vector function that operates on arrays
np_len = np.vectorize(lambda x: len(x))

np_len(t)
# gives array([[3, 3, 5], [4, 4, 3]])

idx = np_len(t).argmin(0) # get the index along the 0th axis
# gives array([0, 0, 1])

result = t
for i in idx[1:]:
    result = result[i]
print result
# gives "two", the string with the smallest length