What is the meaning of the word logits in TensorFlow? [duplicate]
Solution 1:
Logits is an overloaded term which can mean many different things:
In Math, Logit is a function that maps probabilities ([0, 1]
) to R ((-inf, inf)
)
Probability of 0.5 corresponds to a logit of 0. Negative logit correspond to probabilities less than 0.5, positive to > 0.5.
In ML, it can be
the vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class.
Logits also sometimes refer to the element-wise inverse of the sigmoid function.
Solution 2:
Just adding this clarification so that anyone who scrolls down this much can at least gets it right, since there are so many wrong answers upvoted.
Diansheng's answer and JakeJ's answer get it right.
A new answer posted by Shital Shah is an even better and more complete answer.
Yes, logit
as a mathematical function in statistics, but the logit
used in context of neural networks is different. Statistical logit
doesn't even make any sense here.
I couldn't find a formal definition anywhere, but logit
basically means:
The raw predictions which come out of the last layer of the neural network.
1. This is the very tensor on which you apply theargmax
function to get the predicted class.
2. This is the very tensor which you feed into thesoftmax
function to get the probabilities for the predicted classes.
Also, from a tutorial on official tensorflow website:
Logits Layer
The final layer in our neural network is the logits layer, which will return the raw values for our predictions. We create a dense layer with 10 neurons (one for each target class 0–9), with linear activation (the default):
logits = tf.layers.dense(inputs=dropout, units=10)
If you are still confused, the situation is like this:
raw_predictions = neural_net(input_layer)
predicted_class_index_by_raw = argmax(raw_predictions)
probabilities = softmax(raw_predictions)
predicted_class_index_by_prob = argmax(probabilities)
where, predicted_class_index_by_raw
and predicted_class_index_by_prob
will be equal.
Another name for raw_predictions
in the above code is logit
.
As for the why logit
... I have no idea. Sorry.
[Edit: See this answer for the historical motivations behind the term.]
Trivia
Although, if you want to, you can apply statistical logit
to probabilities
that come out of the softmax
function.
If the probability of a certain class is p
,
Then the log-odds of that class is L = logit(p)
.
Also, the probability of that class can be recovered as p = sigmoid(L)
, using the sigmoid
function.
Not very useful to calculate log-odds though.