The Kullback-Leibler (KL) Divergence (also called relative entropy), denoted , is a type of statistical distance, that is, a measure of how one probability distribution is different from a second, reference probability distribution . Mathematically, it is defined as
To intuitively understand the concept, imagine that we have to biased coins, where coin has probability for heads and for tails, coin has probability for heads and for tails. Now we want to know how similar these two coins are.
A straight-forward way to do this is to determine how easy one may confuse these two coins. On the other hand, this means to calculate the likelihood of the coins:
Suppose in a sequence of observations, we observed heads for times and tails for times, then the likelihood function would be
Normalize the sample size by raising it to the power of and then take the log of the expression
Assume the observations are generalized by coin , then and when , so we get
This is exactly the KL-divergence of the two coins' distributions. The closer to the divergence is, the more similar these two distributions are.