Classification

Classification as Regression?

Class 1 and class 2 are more similar, class1 and class3 are not so similar.

1 Class as one-hot vector

More commonly, it is always use sigmoid when binary classification (only 2 class), but sigmoid and soft-max are equivalent.

Minimizing cross-entropy is equivalent to maximizing likelihood.

In Pytorch, when you call cross-entropy, it will include soft-max.

Changing the loss function can change the difficulty of optimization.