Loss function for class imbalanced binary classifier in Tensor flow
You can add class weights to the loss function, by multiplying logits. Regular cross entropy loss is this:
loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))
= -x[class] + log(\sum_j exp(x[j]))
in weighted case:
loss(x, class) = weights[class] * -x[class] + log(\sum_j exp(weights[class] * x[j]))
So by multiplying logits, you are re-scaling predictions of each class by its class weight.
For example:
ratio = 31.0 / (500.0 + 31.0)
class_weight = tf.constant([ratio, 1.0 - ratio])
logits = ... # shape [batch_size, 2]
weighted_logits = tf.mul(logits, class_weight) # shape [batch_size, 2]
xent = tf.nn.softmax_cross_entropy_with_logits(
weighted_logits, labels, name="xent_raw")
There is a standard losses function now that supports weights per batch:
tf.losses.sparse_softmax_cross_entropy(labels=label, logits=logits, weights=weights)
Where weights should be transformed from class weights to a weight per example (with shape [batch_size]). See documentation here.
The code you proposed seems wrong to me. The loss should be multiplied by the weight, I agree.
But if you multiply the logit by the class weights, you end with:
weights[class] * -x[class] + log( \sum_j exp(x[j] * weights[class]) )
The second term is not equal to:
weights[class] * log(\sum_j exp(x[j]))
To show this, we can be rewrite the latter as:
log( (\sum_j exp(x[j]) ^ weights[class] )
So here is the code I'm proposing:
ratio = 31.0 / (500.0 + 31.0)
class_weight = tf.constant([[ratio, 1.0 - ratio]])
logits = ... # shape [batch_size, 2]
weight_per_label = tf.transpose( tf.matmul(labels
, tf.transpose(class_weight)) ) #shape [1, batch_size]
# this is the weight for each datapoint, depending on its label
xent = tf.mul(weight_per_label
, tf.nn.softmax_cross_entropy_with_logits(logits, labels, name="xent_raw") #shape [1, batch_size]
loss = tf.reduce_mean(xent) #shape 1
Use tf.nn.weighted_cross_entropy_with_logits()
and set pos_weight
to 1 / (expected ratio of positives).