Idea

The traditional way for generative models to synthesize contents is by trying to figure out the underlying mechanism of the generating process. However, why not just start with nonsenses (e.g. random noises) and train our model to improve this until the final output is so similar to a real one that no one could differentiate them?

For example, imagine running a video game which generates 60 fps 4K game graphics. If we want to train a model that generates similar game screens, instead of diving into the game code that renders the graphics and learning its way for generation, we could suppose whatever the real mechanism is, there may be just factors that determines the final output graph. By this way, we could just start with a -variable Gaussian noise, and try to train our model to generate graphics from these variables. And we need to train another model to judge whether the generative outputs are similar to the real ones, which then gives the feedback to the generative model and help it to updates its parameters for better generations next time. Ideally, at last we could obtain a model that is able to generate near-real contents from any given input variables.

Mathematical Model

To generalize, anything we see in this world could be viewed as the sampling results from a distribution over some particular data. Therefore, the generator's task is to give a distribution over some data . To learn this distribution, we define a prior on input noise variables , where is some given inputs, then represent a mapping to data space as . Here is a differentiable function represented by a multilayer perceptron with parameter .

We also define a second multilayer perceptron that outputs a single scalar. represents the probability that came form the real data rather than . (For example, may determine as real data and may suspect as totally synthesized data.)

We train to maximize the probability of assigning the correct label to both training examples and samples from , that is, to maximize when came form real data. And we simultaneously train to minimize the probability that recognizes the samples from were actually synthesized, that is, to minimize .

In consequence, and play the following two-player minimax game with value function

In this formula, is sampled from real data distribution and is sampled from the distribution that the generator has learned. For the discriminator , the ideal case is and , so that the right hand side approaches its maximum . As for the generator , the ideal case would be , therefore the right side needs to be minimized. Since the final aim is to get a well-preformed , we put before . The game keeps running until we get Nash Equilibrium.