Derive the ELBO
In variational inference we want to approximate the true posterior distribution from a known distribution family .
We often choose KL divergence as the loss function to measure the difference between two probability distributions. Therefore, we need to minimize
where we define
Then we have
Since the left hand side is constant (not depending on ), too minimize KL divergence is equivalent to maximize
Why Evidence Lower Bound
The name of ELBO comes from