Cross Entropy in PyTorch
I'm a bit confused by the cross entropy loss in PyTorch.
Considering this example:
import torch
import torch.nn as nn
from torch.autograd import Variable
output = Variable(torch.FloatTensor([0,0,0,1])).view(1, -1)
target = Variable(torch.LongTensor([3]))
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)
I would expect the loss to be 0. But I get:
Variable containing:
0.7437
[torch.FloatTensor of size 1]
As far as I know cross entropy can be calculated like this:
But shouldn't be the result then 1*log(1) = 0 ?
I tried different inputs like one-hot encodings, but this doesn't work at all, so it seems the input shape of the loss function is okay.
I would be really grateful if someone could help me out and tell me where my mistake is.
Thanks in advance!
Solution 1:
In your example you are treating output [0, 0, 0, 1]
as probabilities as required by the mathematical definition of cross entropy. But PyTorch treats them as outputs, that don’t need to sum to 1
, and need to be first converted into probabilities for which it uses the softmax function.
So H(p, q)
becomes:
H(p, softmax(output))
Translating the output [0, 0, 0, 1]
into probabilities:
softmax([0, 0, 0, 1]) = [0.1749, 0.1749, 0.1749, 0.4754]
whence:
-log(0.4754) = 0.7437
Solution 2:
Your understanding is correct but pytorch doesn't compute cross entropy in that way. Pytorch uses the following formula.
loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))
= -x[class] + log(\sum_j exp(x[j]))
Since, in your scenario, x = [0, 0, 0, 1]
and class = 3
, if you evaluate the above expression, you would get:
loss(x, class) = -1 + log(exp(0) + exp(0) + exp(0) + exp(1))
= 0.7437
Pytorch considers natural logarithm.
Solution 3:
I would like to add an important note, as this often leads to confusion.
Softmax is not a loss function, nor is it really an activation function. It has a very specific task: It is used for multi-class classification to normalize the scores for the given classes. By doing so we get probabilities for each class that sum up to 1.
Softmax is combined with Cross-Entropy-Loss to calculate the loss of a model.
Unfortunately, because this combination is so common, it is often abbreviated. Some are using the term Softmax-Loss, whereas PyTorch calls it only Cross-Entropy-Loss.