Softmax is a mathematical function often used in machine learning, particularly in the context of classification problems where multiple classes are involved. It transforms a vector of real-valued scores (logits) into a probability distribution. This is useful because the outputs of models, especially neural networks, are often unbounded scores that need to be converted into probabilities that sum to .
Definition
Given a vector of scores , where is the number of classes, the softmax function is defined as:
for .
Properties
-
Probability Distribution: The output of the softmax function is a probability distribution over the classes. Each output lies in the range , and the sum of all outputs is equal to 1:
-
Exponentiation: The use of the exponential function ensures that the outputs are positive. The exponentiation also amplifies the differences between scores: if one score is significantly larger than the others, its corresponding softmax probability will be higher.
-
Sensitivity to Input: The softmax function is sensitive to the input values. A small change in the input can lead to a large change in the output probabilities, particularly when the input scores are very close together.