Tweedie's Formula states that the true mean of an exponential family distribution, given samples drawn from it, can be estimated by the maximum likelihood estimate of the samples (aka empirical mean) plus some correction term involving the score of the estimate.

Selection Bias

If we have observed a series of $z_{i}$ , where each $z_{i} \sim N (μ_{i}, σ^{2})$ , for $i = 1, 2, \dots, N$ .

Then we select the $m$ biggest ones. What can we say about their corresponding $μ$ values? They will usually be smaller than the selected $z$ 's, and this is called the Selection Bias

Tweedie's Formula

Image we sample $μ$ from a unknown distribution $μ \sim g (\cdot)$ , and then sample the corresponding $z ∣ μ \sim N (μ, σ^{2})$

Now we calculate the marginal PDF: $f (z) = \int_{- \infty}^{\infty} φ (z - μ) g (μ) d μ$ , where $φ (z) = \frac{exp ( - \frac{1}{2} z ^{2} )}{2 π}$ (See standard normal distribution)

Then we use Tweedie's Formula to estimate true mean of the observed $z$

E [μ ∣ z] = z + σ^{2} \frac{d}{d z} [lo g f (z)]

where

$z$ is the MLE of $μ$ , aka empirical mean
$\frac{d}{d z} [lo g f (z)]$ is the correction term. If the observed sample lies on one end of the underlying distribution, this term calculate a score to correct the estimation.

Lin's Notes Garden

Explorer

Tweedie's Formula

Selection Bias

Tweedie's Formula

Graph View

Table of Contents

Backlinks