Interval estimation Methods
The interval module can be used to estimate a mixture of intervals. Currently the focus is on confidence intervals, but this can be expanded to include prediction intervals or highest density intervals for example.
[1]:
import numpy as np
from stats_misc.intervals import (
univariable_quantiles_exact,
univariable_poisson_standard_normal,
wald_confidence_interval,
beta_confidence_interval,
linear_contrast,
)
Confidence intervals for point estimates
The following confidence intervals are available:
Wald based, assuming the point estimate follows a normal distribution,
Count data, assuming the point estimate follows a normal distribution,
Beta distribution, for proportions.
[2]:
# wald confidence interval
res = wald_confidence_interval(0.2, 0.01)
print(f'point estimate: {res.point_estimate} with {res.coverage}% confidence interval: '
f'{res.interval_values[0]:.2f}; {res.interval_values[1]:.2f}.')
point estimate: 0.2 with 0.95% confidence interval: 0.18; 0.22.
[3]:
# the class instance
res
[3]:
wald_confidence_interval
point_estimate=0.200
coverage=0.950
interval_values=(0.180, 0.220)
standard_error=0.010
[4]:
# confidence interval for count data
res = univariable_poisson_standard_normal([1, 2, 3, 4, 5, 10,11, 1.5, 2.5], alpha=0.001)
print(f'point estimate: {res.point_estimate:.2f} with {res.coverage}% confidence interval: '
f'{res.interval_values[0]:.2f}; {res.interval_values[1]:.2f}.')
point estimate: 4.44 with 0.999% confidence interval: 2.13; 6.76.
[5]:
# confidence interval for proportions
res = beta_confidence_interval(0.2, 20, alpha=0.25)
print(f'point estimate: {res.point_estimate:.2f} with {res.coverage}% confidence interval: '
f'{res.interval_values[0]:.2f}; {res.interval_values[1]:.2f}.')
point estimate: 0.20 with 0.75% confidence interval: 0.10; 0.35.
Confidence intervals for a quantile/percentage
The following example calculate a confidence interval for a quantile betweeen 0 and 100% of the data. The calculations are based on assuming the propbability of obtaining a value below or above the requested quantile follows a binomial distribution. The function will return confiden interval limits based on the observed data, selecting limits which are closest to the requested coverage (e.g., 95%). The obtained coverage is guaranteed to be equal or larger than the requested level. The procedure is essentially non-parametric and can be applied without making strong distributional assumptions about the sample or population.
[6]:
data = [1, 2, 3, 4, 5, 10,11, 1.5, 2.5, 2, 2.2, 3.5, 13.1, 0.5, 0.2, 0.15, 0.01, 7, 5, 8, 9.2, 4.8]
res1 = univariable_quantiles_exact(data, quantile=0.5, alpha=0.10)
res2 = univariable_quantiles_exact(data, quantile=0.15, alpha=0.25)
print(f'The {res1.coverage:.2f}% confidence interval for qunantile 0.50 is : {res1.interval_values[0]:.2f}; {res1.interval_values[1]:.2f}.\n'
f'The {res2.coverage:.2f}% confidence interval for qunantile 0.15 is : {res2.interval_values[0]:.2f}; {res2.interval_values[1]:.2f} ')
res1
The 0.91% confidence interval for qunantile 0.50 is : 2.00; 5.00.
The 0.76% confidence interval for qunantile 0.15 is : 0.20; 2.00
[6]:
univariable_quantiles_exact
point_estimate=3.250
coverage=0.907
interval_values=[2.000, 5.000]
interval_indices=[7.000, 15.000]
matrix_coverage=array([[0.848 0.73 0.925 0.965 0.983]
[0.855 0.736 0.931 0.972 0.989]
[0.831 0.712 0.907 0.948 0.965]
[0.79 0.671 0.866 0.907 0.925]
[0.714 0.595 0.79 0.831 0.848]], shape=(5, 5), dtype=float64)
matrix_columns=list([14, 13, 15, 16, 17])
matrix_rows=list([6, 5, 7, 8, 9])
The window parameter
window controls the size of the candidate search grid: the function evaluates all (lower, upper) order-statistic index pairs within ±window steps of the binom.ppf starting bounds, producing a (2w+1) × (2w+1) coverage matrix stored in matrix_coverage.
Because coverage is monotone with interval width, the selected pair is almost always identical regardless of window — binom.ppf already places the starting bounds near-optimally. What changes is only the size of the evaluated grid.
The main reason to increase window is defensive: if the function raises an error (no valid pair found), a larger window expands the search. This can occur with very small samples and extreme quantiles where the binomial distribution is highly discrete.
[7]:
for w in [1, 2, 3]:
res = univariable_quantiles_exact(data, quantile=0.5, alpha=0.10, window=w)
print(f'window={w}: matrix_coverage shape={res.matrix_coverage.shape}, '
f'indices={res.interval_indices}, coverage={res.coverage:.3f}')
# Inspect the full 3x3 coverage matrix at window=1
res_w1 = univariable_quantiles_exact(data, quantile=0.5, alpha=0.10, window=1)
res_w1
window=1: matrix_coverage shape=(3, 3), indices=[7, 15], coverage=0.907
window=2: matrix_coverage shape=(5, 5), indices=[7, 15], coverage=0.907
window=3: matrix_coverage shape=(7, 7), indices=[7, 15], coverage=0.907
[7]:
univariable_quantiles_exact
point_estimate=3.250
coverage=0.907
interval_values=[2.000, 5.000]
interval_indices=[7.000, 15.000]
matrix_coverage=array([[0.848 0.925 0.965]
[0.831 0.907 0.948]
[0.79 0.866 0.907]], shape=(3, 3), dtype=float64)
matrix_columns=list([14, 15, 16])
matrix_rows=list([6, 7, 8])
Linear contrasts of regression coefficients
linear_contrast propagates uncertainty through arbitrary linear combinations of regression coefficients:
Var(L) = D' * V * D
where D is the contrast matrix (one column per contrast, shape n_coef × n_contrasts), V is the variance–covariance (VCV) matrix of the coefficients, and c is the coefficient vector. The function returns a DataFrame with one row per contrast containing the estimate, standard error, and confidence bounds.
This works with any model that exposes a coefficient vector and VCV matrix: OLS, GLM, Cox, mixed-effects, and penalised regression. Estimates and intervals are on the link scale - back-transform externally if needed (e.g. np.exp() for a log link).
The example below uses a three-group regression (intercept + two indicator coefficients, with group A as the reference) to compute pairwise contrasts.
[8]:
c_vec = np.array([1.0, 0.3, 0.5]) # [intercept, coef_B, coef_C]; group A is the reference
v_mat = np.array([
[0.04, 0.01, 0.01],
[0.01, 0.02, 0.00],
[0.01, 0.00, 0.03],
])
# d_mat shape: (n_coef=3, n_contrasts=3)
# Column 0: B vs A → [0, 1, 0] => estimate = coef_B
# Column 1: C vs A → [0, 0, 1] => estimate = coef_C
# Column 2: C vs B → [0, -1, 1] => estimate = coef_C - coef_B
d_mat = np.array([
[0, 0, 0],
[1, 0, -1],
[0, 1, 1],
])
res = linear_contrast(d_mat, v_mat, c_vec, contrast_names=['B_vs_A', 'C_vs_A', 'C_vs_B'])
res
[8]:
| contrast | estimate | se | lower | upper | |
|---|---|---|---|---|---|
| 0 | B_vs_A | 0.3 | 0.141421 | 0.022819 | 0.577181 |
| 1 | C_vs_A | 0.5 | 0.173205 | 0.160524 | 0.839476 |
| 2 | C_vs_B | 0.2 | 0.223607 | -0.238261 | 0.638261 |