Softmax is a mathematical function often used in machine learning, particularly in the context of classification problems where multiple classes are involved. It transforms a vector of real-valued scores (logits) into a probability distribution. This is useful because the outputs of models, especially neural networks, are often unbounded scores that need to be converted into probabilities that sum to $1$ .

Definition

Given a vector of scores $z = [z_{1}, z_{2}, \dots, z_{K}]$ , where $K$ is the number of classes, the softmax function is defined as:

softmax (z_{i}) = \frac{e ^{z_{i}}}{\sum _{j = 1}^{K} e ^{z_{j}}}

for $i = 1, 2, \dots, K$ .

Properties

Probability Distribution: The output of the softmax function is a probability distribution over the $K$ classes. Each output $softmax (z_{i})$ lies in the range $(0, 1)$ , and the sum of all outputs is equal to 1: $\sum_{i = 1}^{K} softmax (z_{i}) = 1$
Exponentiation: The use of the exponential function $e^{z_{i}}$ ensures that the outputs are positive. The exponentiation also amplifies the differences between scores: if one score is significantly larger than the others, its corresponding softmax probability will be higher.
Sensitivity to Input: The softmax function is sensitive to the input values. A small change in the input can lead to a large change in the output probabilities, particularly when the input scores are very close together.

Lin's Notes Garden

Explorer

Softmax

Definition

Properties

Graph View

Table of Contents

Backlinks