Tweedie’s Formula states that the true mean of an exponential family distribution, given samples drawn from it, can be estimated by the maximum likelihood estimate of the samples (aka empirical mean) plus some correction term involving the score of the estimate.
Selection Bias
If we have observed a series of , where each , for .
Then we select the biggest ones. What can we say about their corresponding values? They will usually be smaller than the selected 's, and this is called the Selection Bias
Tweedie's Formula
Image we sample from a unknown distribution , and then sample the corresponding
Now we calculate the marginal PDF: , where (See standard normal distribution)
Then we use Tweedie's Formula to estimate true mean of the observed
where
- is the MLE of , aka empirical mean
- is the correction term. If the observed sample lies on one end of the underlying distribution, this term calculate a score to correct the estimation.