Intuition
Suppose we have an observable random variable , and we want to find its true distribution . This would allow us to generate data by sampling (such as VAE), and estimate probabilities of future events. In general, it is impossible to find exactly, forcing us to search for a good approximation
To do this, we define a sufficiently large parametric family of distributions (often "nice distributions" such as Gaussian distributions), then solve for for some loss function . One possible way to solve this is by considering small variation from to , and solve for . This is called the variational method
Variational Bayesian Inference
We consider implicitly parametrized probability distributions, and we define
- A simple distribution over a latent random variable . Usually a normal distribution or a uniform distribution
- A family of complicated functions (such as deep neural network) parametrized by
- A way to convert any into a simple distribution over the observable random variable . For example, let have two outputs, then we can define the corresponding distribution over to be the normal distribution All these define a family of joint distribution over . We can sample from by first sampling , and then sample using
In other words, we have a generative model for both the observable and the latent. We consider a distribution good, if it is a close approximation of , namely
Since the above is over only, so must be derived by marginalizing the latent variable away. However, in general it's impossible to perform the integral , forcing us to perform another approximation
From Bayes' Theorem we can know . Here, we already know and , so if we can find a good approximation of , then we can get . Therefore, we define another distribution family and use it to approximate .
In Bayesian language, is the observed evidence, and is the latent/unobserved. The distribution over is the prior distribution over , is the likelihood function, and is the posterior distribution over