Why is the softmax function called that way?

I understand that the function "squashes" a real vector space between the values 0 and 1.

However I don't see what this has to do with the "max" function, or why that makes it a "softer" version of the max function.

The largest element in the input vector remains the largest element after the softmax function is applied to the vector, hence the "max" part. The "soft" signifies that the function keeps information about the other, non-maximal elements in a reversible way (as opposed to a "hardmax", which is just the standard maximum function).

The function produces a probability distribution from any vector, and is thus used in machine learning when inputs need to be classified. The output of a neural network is normalised via this function, and this normalisation is required for machine learning techniques to work.

I always thought it was called softmax because it is differentiable ("soft") at all points for all elements of the input vector. This explanation would be analogous to what makes the softplus function, $f(x) = \ln(1 + e^x)$, the "soft" version of $f(x) = \max(0, x)$

Is the topologist's sine curve a manifold? [duplicate]

Roundest ellipse with specified tangents

Subgroups of $(\mathbb R, +)$ are either dense or cyclic.

Do there exist algebras of more directions of operation than left-right?

A series with Fibonacci numbers and the golden ratio

Why did this extraneous root creep into the solution?

Why does Mochizuki insist on “forgetting the previous history of an object”?

Why is it Euler's 'Totient' Function?

What does a topology do, and what makes a particular topology the 'right' one?

Trig identities analogous to $\tan(\pi/5)+4\sin(\pi/5)=\sqrt{5+2\sqrt{5}}$

Evaluate the indefinite integral $\int \frac{\sin^3\frac\theta 2}{\cos\frac\theta2 \sqrt{\cos^3\theta + \cos^2\theta + \cos\theta}} d\theta$

Evaluate integral: $\int_0^{\frac{\pi}{2}}\ln(a^2\cos^2 x +b^2\sin^2x)dx$?

Why is the softmax function called that way?

Related

Recent Posts