If we model a set of observations as a random sample from an unknown joint probability distribution which is express in terms of s set of parameters. The goal of maximum likelihood estimation is to determine the parameters for which the observed data have the highest joint probability.

We write the parameters governing the joint distribution as a vector $θ = [θ_{1}, θ_{2}, \dots, θ_{k}]^{T}$ so that this distribution falls within a parametric family ${f (\cdot; θ) ∣ θ \in Θ}$ , where $Θ$ is called the parameter space, a finite-dimensional subset of Euclidean space. Evaluating the joint density at the observed data sample $y = (y_{1}, y_{2}, \dots, y_{n})$ gives a real-valued function

L_{n} (θ) = L_{n} (θ; y) = f_{n} (y; θ)

which is called the likelihood function. For i.i.d. random variables, $f_{n} (y; θ)$ will be the product of univariate (that is, only one variable) PDF:

f_{n} (y; θ) = k = 1 \prod n f_{k} (y_{k}; θ)

The goal of maximum likelihood estimation is to find the values of the model parameters that maximize the likelihood function over the parameter space, that is

\hat{θ} = θ \in Θ ar g max L_{n} (θ; y)

Tip

Sometimes we use the log likelihood function, that is $l (θ) = ln L (θ) = \sum_{k = 1}^{n} ln f_{k} (y_{k}; θ)$

Invariant Property

\hat{θ} = θ \in Θ ar g max L_{n} (θ; y) ⟹ g (\hat{θ}) = g (θ) \in g (Θ) ar g max L_{n} (g (θ); y)

Lin's Notes Garden

Explorer

Maximum Likelihood Estimation (MLE)

Invariant Property

Graph View

Backlinks