What is the meaning of the word logits in TensorFlow? [duplicate]

Solution 1:

Logits is an overloaded term which can mean many different things:


In Math, Logit is a function that maps probabilities ([0, 1]) to R ((-inf, inf))

enter image description here

Probability of 0.5 corresponds to a logit of 0. Negative logit correspond to probabilities less than 0.5, positive to > 0.5.

In ML, it can be

the vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class.

Logits also sometimes refer to the element-wise inverse of the sigmoid function.

Solution 2:

Just adding this clarification so that anyone who scrolls down this much can at least gets it right, since there are so many wrong answers upvoted.

Diansheng's answer and JakeJ's answer get it right.
A new answer posted by Shital Shah is an even better and more complete answer.


Yes, logit as a mathematical function in statistics, but the logit used in context of neural networks is different. Statistical logit doesn't even make any sense here.


I couldn't find a formal definition anywhere, but logit basically means:

The raw predictions which come out of the last layer of the neural network.
1. This is the very tensor on which you apply the argmax function to get the predicted class.
2. This is the very tensor which you feed into the softmax function to get the probabilities for the predicted classes.


Also, from a tutorial on official tensorflow website:

Logits Layer

The final layer in our neural network is the logits layer, which will return the raw values for our predictions. We create a dense layer with 10 neurons (one for each target class 0–9), with linear activation (the default):

logits = tf.layers.dense(inputs=dropout, units=10)

If you are still confused, the situation is like this:

raw_predictions = neural_net(input_layer)
predicted_class_index_by_raw = argmax(raw_predictions)
probabilities = softmax(raw_predictions)
predicted_class_index_by_prob = argmax(probabilities)

where, predicted_class_index_by_raw and predicted_class_index_by_prob will be equal.

Another name for raw_predictions in the above code is logit.


As for the why logit... I have no idea. Sorry.
[Edit: See this answer for the historical motivations behind the term.]


Trivia

Although, if you want to, you can apply statistical logit to probabilities that come out of the softmax function.

If the probability of a certain class is p,
Then the log-odds of that class is L = logit(p).

Also, the probability of that class can be recovered as p = sigmoid(L), using the sigmoid function.

Not very useful to calculate log-odds though.