F1 scores depend on which class is given the positive label?
For multiclass classification you should use a cross-entropy measure. Cross-Entropy is invariant to relabeling. By relabeling you are only reordering the terms in a summation.
If you want to use f1 score, you have to use F score, be aware that it will be invariant to label swapping if, and only if, the number of true positives equals the number of true negatives.
In your example I see 3 true negatives, 2 true positives. If I remove one true negative, we have the same F1 score after swapping labels.
m.f1_score([1,1,0,0,1],[1,1,0,0,0]) # 0.8
m.f1_score([0,0,1,1,0],[0,0,1,1,1]) # 0.8
Mathematically
Let's start with one formula from [Wikipedia F-score page], in order to skip some steps.
Were tp
is for true positive rate, fn
is false negative rate.
I will use a '
to denote the measures for swapped labels.
By swapping labels we have tn'=tp
, fn'=fp
, fp'=fn
, tp'=tn
.
If you want
F1'=F1
. We have tp/(tp+(fn+fp)/2)=tp'/(tp'+(fn'+fp')/2)=tn/(tn+(fn+fp)/2)
. That is satisfied if, and only if, tp=tn
.