Interval estimation
The interval module can be used to estimate a mixture of intervals. Currently the focus is on confidence intervals, but this can be expanded to include prediction intervals or highest density intervals for example.
[1]:
# imports
import numpy as np
from stats_misc.intervals import (
univariable_quantiles_exact,
univariable_poisson_standard_normal,
wald_confidence_interval,
beta_confidence_interval,
)
Confidence intervals for point estimates
The following confidence intervals are available:
Wald based, assuming the point estimate follows a normal distribution,
Count data, assuming the point estimate follows a normal distribution,
Beta distribution, for proportions.
[2]:
# wald confidence interval
res = wald_confidence_interval(0.2, 0.01)
print(f'point estimate: {res.point_estimate} with {res.coverage}% confidence interval: {res.interval_values[0]:.2f}; {res.interval_values[1]:.2f}.')
point estimate: 0.2 with 0.95% confidence interval: 0.18; 0.22.
[3]:
# confidence interval for count data
res = univariable_poisson_standard_normal([1, 2, 3, 4, 5, 10,11, 1.5, 2.5], alpha=0.001)
print(f'point estimate: {res.point_estimate:.2f} with {res.coverage}% confidence interval: {res.interval_values[0]:.2f}; {res.interval_values[1]:.2f}.')
point estimate: 4.44 with 0.999% confidence interval: 2.13; 6.76.
[4]:
# confidence interval for proportions
res = beta_confidence_interval(0.2, 20, alpha=0.25)
print(f'point estimate: {res.point_estimate:.2f} with {res.coverage}% confidence interval: {res.interval_values[0]:.2f}; {res.interval_values[1]:.2f}.')
point estimate: 0.20 with 0.75% confidence interval: 0.10; 0.35.
Confidence intervals for a quantile/percentage
The following example calculate a confidence interval for a quantile betweeen 0 and 100% of the data. The calculations are based on assuming the propbability of obtaining a value below or above the requested quantile follows a binomial distribution. The function will return confiden interval limits based on the observed data, selecting limits which are closest to the requested coverage (e.g., 95%). The obtained coverage is guaranteed to be equal or larger than the requested level. The procedure is essentially non-parametric and can be applied without making strong distributional assumptions about the sample or population.
[5]:
data = [1, 2, 3, 4, 5, 10,11, 1.5, 2.5, 2, 2.2, 3.5, 13.1, 0.5, 0.2, 0.15, 0.01, 7, 5, 8, 9.2, 4.8]
res1 = univariable_quantiles_exact(data, quantile=0.5, alpha=0.10)
res2 = univariable_quantiles_exact(data, quantile=0.15, alpha=0.25)
print(f'The {res1.coverage:.2f}% confidence interval for qunantile 0.50 is : {res1.interval_values[0]:.2f}; {res1.interval_values[1]:.2f}.\nThe {res2.coverage:.2f}% confidence interval for qunantile 0.15 is : {res2.interval_values[0]:.2f}; {res2.interval_values[1]:.2f} ')
The 0.91% confidence interval for qunantile 0.50 is : 2.00; 5.00.
The 0.76% confidence interval for qunantile 0.15 is : 0.20; 2.00