Derive the ELBO

In variational inference we want to approximate the true posterior distribution $p_{θ} (z ∣ x)$ from a known distribution family $q_{ϕ} (z ∣ x)$ .

We often choose KL divergence as the loss function to measure the difference between two probability distributions. Therefore, we need to minimize

D_{KL} (q_{ϕ} (z ∣ x) ∥ p_{θ} (z ∣ x)) = E_{q_{ϕ} (z ∣ x)} [lo g \frac{q _{ϕ} ( z ∣ x )}{p _{θ} ( z ∣ x )}] = E_{q_{ϕ} (z ∣ x)} lo g q_{ϕ} (z ∣ x) - E_{q_{ϕ} (z ∣ x)} lo g p_{θ} (z ∣ x) = E_{q_{ϕ} (z ∣ x)} lo g q_{ϕ} (z ∣ x) - E_{q_{ϕ} (z ∣ x)} {lo g [\frac{p _{θ} ( x , z )}{p _{θ} ( x )}]} = E_{q_{ϕ} (z ∣ x)} lo g q_{ϕ} (z ∣ x) - E_{q_{ϕ} (z ∣ x)} [lo g p_{θ} (x, z)] + E_{q_{ϕ} (z ∣ x)} [lo g p_{θ} (x)] = - ELBO E_{q_{ϕ} (z ∣ x)} lo g q_{ϕ} (z ∣ x) - E_{q_{ϕ} (z ∣ x)} [lo g p_{θ} (x, z)] + lo g p_{θ} (x)

where we define

ELBO (x) = E_{q_{ϕ} (z ∣ x)} [lo g p_{θ} (x, z)] - E_{q_{ϕ} (z ∣ x)} lo g q_{ϕ} (z ∣ x)

Then we have

lo g p_{θ} (x) = ELBO (x) + D_{KL} (q_{ϕ} (z ∣ x) ∥ p_{θ} (z ∣ x))

Since the left hand side is constant (not depending on $ϕ$ ), too minimize KL divergence is equivalent to maximize $ELBO (ϕ)$

Why Evidence Lower Bound

The name of ELBO comes from $D_{DL} (\cdot) \geq 0 ⟹ lo g p_{θ} (x) \geq ELBO (x)$

Lin's Notes Garden