Tweedie’s Formula states that the true mean of an exponential family distribution, given samples drawn from it, can be estimated by the maximum likelihood estimate of the samples (aka empirical mean) plus some correction term involving the score of the estimate.

Selection Bias

If we have observed a series of , where each , for .

Then we select the biggest ones. What can we say about their corresponding values? They will usually be smaller than the selected 's, and this is called the Selection Bias

Tweedie's Formula

Image we sample from a unknown distribution , and then sample the corresponding

Now we calculate the marginal PDF: , where (See standard normal distribution)

Then we use Tweedie's Formula to estimate true mean of the observed

where

  • is the MLE of , aka empirical mean
  • is the correction term. If the observed sample lies on one end of the underlying distribution, this term calculate a score to correct the estimation.