stats_misc.utils.general

Functions used in multiple stat-miscs modules which may be of general utility.

class stats_misc.utils.general.ObservedPower(statistic: float, alpha: float = 0.05, **kwargs: Any)[source]

Computes observed power (two-sided, above, below) for a given z-statistic, based on the normal distribution.

Parameters:

statistic (float) – The z-statistic (e.g., estimate / standard error).
alpha (float, default 0.05) – The type I error rate.
**kwargs – Optional keyword arguments passed to the distribution.

compute_power()[source]: Compute two-sided observed power using the configured distribution.

compute_power_above()[source]: Compute one-sided observed power for the ‘above’ (right) side.

compute_power_below()[source]: Compute one-sided observed power for the ‘below’ (left) side.

Notes

Observed power is equivalent to the p-value evaluated under the alternative hypothesis instead of the null-hypothesis. For example, if the p-value is 0.05 and one is performing a test against an alpha of 0.05, the observed power is 50% (i.e., if we would repeat the exact same experiment, under the alternative hypothesis, there would be a 50% probability of observing a p-value smaller than 0.05. Because, of the equivalence between the p-value and observed power, it has very limited application.

compute_power() → float[source]

Compute two-sided observed power using the configured distribution.

crit

The critical value beyond which the null-hypothesis would be rejected.

Type:: float

Returns:: The observed power.
Return type:: float

compute_power_above() → float[source]

Compute one-sided observed power for the ‘above’ (right) side.

crit

The critical value beyond which the null-hypothesis would be rejected.

Type:: float

Returns:: The observed power.
Return type:: float

compute_power_below() → float[source]

Compute one-sided observed power for the ‘below’ (left) side.

crit

The critical value beyond which the null-hypothesis would be rejected.

Type:: float

Returns:: The observed power.
Return type:: float

class stats_misc.utils.general.ObservedPowerChiSquare(statistic: float, df: float, alpha: float = 0.05)[source]

Computes observed power for a given statistic, using the non-central chi-square distribution.

Parameters:

statistic (float) – The chi-square statistic (e.g., sum of squares of standardised residuals).
df (float) – Degrees of freedom for the chi-square distribution.
alpha (float, default 0.05) – The type I error rate.

Notes

Inherits from ObservedPower but overrides the distribution to accommodate its strictly positive domain.

In a chi-square context, one generally considers whether the observed statistic is ‘above’ a critical threshold. Hence, compute_power() internally calls compute_power_above() rather than summing two tails. Because chi-square is not defined below zero, compute_power_below() is not meaningful and thus raises a NotImplementedError.

compute_power() → float[source]

Compute one-sided observed power for the ‘above’ (right) side.

crit

The critical value beyond which the null-hypothesis would be rejected.

Type:: float

Returns:: The observed power.
Return type:: float

compute_power_above() → float[source]

Compute observed power for the right tail using the non-central chi-square distribution.

crit

The critical value (from the central distribution) beyond which the null hypothesis would be rejected.

Type:: float

Returns:: The observed power.
Return type:: float

Notes

The procedure: - First, compute the critical value (ppf) under the central chi-square

(nc=0).

Then, evaluate the cdf at x = self.crit under the non-central chi-square with nc = self.statistic.
Subtract the density from 1 to calculate the observed power.

This approach can be easily verified using the normal distribution

>>> from scipy.stats import norm
>>> # use a test statistic of 4, and critical value of 1.96
>>> ncnorm = norm(loc=4, scale=1.0)
>>> 1-ncnorm.cdf(1.96)
0.9793
>>> norm.cdf(4-1.96)
0.9793

compute_power_below() → float[source]

Raises a NotImplementedError because a chi-square distribution is only defined on [0, ∞), so a ‘below’ tail does not correspond to a meaningful test.

Raises:: NotImplementedError – Because ‘below’ power is not defined for chi-square.

class stats_misc.utils.general.ObservedPowerT(statistic: float, df: float, alpha: float = 0.05)[source]

Computes observed power (two-sided, above, below) for a given test-statistic, using the t-distribution.

Parameters:

statistic (float) – The t-statistic (e.g., estimate / standard error).
df (float) – The degrees of freedom for the t-distribution.
alpha (float, default 0.05) – The type I error rate.

compute_power(): Compute two-sided observed power using the configured distribution.

compute_power_above(): Compute one-sided observed power for the ‘above’ (right) side.

compute_power_below(): Compute one-sided observed power for the ‘below’ (left) side.

Notes

Inherits from ObservedPower and adds degrees of freedom via keyword arguments.

stats_misc.utils.general.calculate_pvalue(z_statistic: float | int, side: Literal['two', 'left', 'right', 'below', 'above'] = 'two') → float[source]

Calculate the P-value based on the Z-statistic and the standard normal distribution.

Parameters:

z_statistic (float) – Typically the ratio of the point estimate and the standard error, representing the standardized difference from the value under the null-hypothesis.
side ({two, left, right, ‘below’, ‘above’}, default two) – left will perform a left-sided, one-sided, z-test. right will perform a right-sided, one-sided, z-test. below is a synonym for left, and ‘above` is a synonym for right.

Returns:

P-value.

Return type:

float

Notes

Use left if your null-hypothesis is .. math:: mu geq mu_0, and the alternative hypothesis is .. math:: mu < mu_0.

Use right if your null-hypothesis is .. math:: mu leq mu_0, and the alternative hypothesis is .. math:: mu > mu_0.

So for left the alternative hypothesis supposes is to the left of null-hypothesis, while for right we assume this is to the right.

stats_misc.utils.general.calculate_pvalue_fdist(f_statistic: float | int, degrees_numerator: int, degrees_denominator: int) → float[source]

Calculate the P-value based on the F-statistic and the F distribution.

Parameters:

f_statistic (float) – The F test-statistic.
degrees_numerator (int) – Numerator degrees of freedom.
degrees_denominator (int) – Denominator degrees of freedom.

Returns:

P-value.

Return type:

float

stats_misc.utils.general.calculate_pvalue_tdist(t_statistic: float | int, degrees: int, side: Literal['two', 'left', 'right', 'below', 'above'] = 'two') → float[source]

Calculate the P-value based on the t-statistic and the t distribution.

Parameters:

t_statistic (float) – Typically the ratio of the point estimate and the standard error, representing the standardized difference from the value under the null-hypothesis.
degrees (int) – The degrees of freedom.
side ({two, left, right, ‘below’, ‘above’}, default two) – left will perform a left-sided, one-sided, z-test. right will perform a right-sided, one-sided, z-test. below is a synonym for left, and ‘above` is a synonym for right.

Returns:

P-value.

Return type:

float

stats_misc.utils.general.invlogit(l: ndarray | float | int) → ndarray[source]

The inverse of the logit function.

Parameters:: l (np.ndarray, float, or int) – Log-odds value(s) to transform.
Returns:: Probability value(s) in (0, 1).
Return type:: np.ndarray

stats_misc.utils.general.logit(p: ndarray | float | int, add: float = 0.0001, sub: float = 0.0001) → ndarray[source]

The logit function.

Parameters:

p (np.ndarray, float, or int) – Probability value(s); must be in [0, 1].
add (float, default 0.0001) – Constant added to p when p equals 0.
sub (float, default 0.0001) – Constant subtracted from p when p equals 1.

Returns:

Log-odds of p.

Return type:

np.ndarray

Warns:

UserWarning – When any value of p is exactly 0 or 1.

Raises:

ValueError – When any value of p is outside [0, 1].

stats_misc.utils.helpers

Helper functions to support function and class actions.

class stats_misc.utils.helpers.ManagedProperty(name: str, types: tuple[type] | type | None = None)[source]

A generic property factory defining setters and getters, with optional type validation.

Parameters:

name (str) – The name of the setters and getters
types (Type, default NoneType) – Either a single type, or a tuple of types to test against.

enable_setter()[source]: Enables the setter for the property, allowing attribute assignment.

disable_setter()[source]: Disables the setter for the property, making the property read-only.

set_with_setter(instance, value)[source]: Enables the setter, sets the property value, and then disables the setter, ensuring controlled updates.

Returns:: A property object with getter and setter.
Return type:: property

disable_setter() → None[source]: Disable the setter for the property.

enable_setter() → None[source]: Enable the setter for the property.

set_with_setter(instance: object, value: Any) → None[source]

Enable the setter, set the property value, and then disable the setter.

Parameters:

instance (object) – The instance on which the property is being set.
value (Any) – The value to assign to the property.

class stats_misc.utils.helpers.Results(*, set_args: list[str], **kwargs: Any)[source]

A general results class.

Parameters:

set_args (list of str) – Names of the attributes to set on the instance.
**kwargs (Any) – Values for each name in set_args.

Raises:

AttributeError – If an unrecognised keyword is provided.

Warns:

UserWarning – For each name in set_args not supplied in kwargs.

stats_misc.genetics.general

Genetics utility functions for the stats-misc module.

stats_misc.genetics.general.minor_allele_frequency(genotypes: list[int | float]) → float[source]

Calculate the minor allele frequency (MAF) from genotype dosages.

Parameters:: genotypes (list [int | float]) – 1D array of genotype dosages coded as 0, 1, or 2, representing the count of the effect allele per individual. Fractional values in [0, 2] (e.g. imputed dosages) are accepted. Missing values are not permitted and will raise an error.
Returns:: maf – The minor allele frequency, bounded in the interval [0, 0.5].
Return type:: float
Raises:: InputValidationError – If the input contains missing values, is empty, or contains dosages outside the range [0, 2].

Notes

The allele frequency is computed as the mean dosage divided by two. The MAF is then defined as min(p, 1 - p), where p is the frequency of the effect allele.

Examples

>>> minor_allele_frequency([0, 1, 1, 2, 0])
0.4

stats_misc.intervals

Interval estimation tools for confidence and prediction intervals.

This module provides functions to compute interval estimates from empirical data, including exact and approximate confidence intervals and prediction intervals. It supports methods based on the normal approximation, binomial distribution, beta distribution, and quantile-based approaches.

Classes

IntervalResults: A results container for interval estimates.
QuantilesIntervalResults: A specialised results class for quantile-based confidence intervals, including index tracking and coverage matrices.

Functions

univariable_quantiles_exact(values, quantile, alpha=0.05, window=2): Computes exact nonparametric confidence intervals for an arbitrary quantile.
univariable_poisson_standard_normal(counts, alpha=0.05): Computes Poisson confidence intervals for count data based on a normal approximation.
wald_confidence_interval(point, se, alpha=0.05): Computes confidence intervals based on the normal distribution.
beta_confidence_interval(proportion, total_sample, alpha=0.05, integer=True): Computes confidence intervals for proportions based on the beta distribution.

class stats_misc.intervals.IntervalResults(**kwargs)[source]

A results objects containing interval estimates.

interval_values

The lower and upper bounds of the interval.

Type:: tuple [float, float]

point_estimate

The estimate with the highest data-support.

Type:: float

standard_error

The estimated population variability of the point estimate.

Type:: float

coverage

The interval coverage.

Type:: float

class stats_misc.intervals.QuantilesIntervalResults(**kwargs)[source]

The results class for univariable_quantiles_exact.

interval_indices

The list indices of the interval lower and upper bounds.

Type:: list [int]

interval_values

The lower and upper bounds of the interval.

Type:: list [float]

point_estimate

The estimate with the highest data-support.

Type:: float

coverage

The interval coverage.

Type:: float

matrix_coverage

A 2D array of coverage probabilities.

Type:: numpy.ndarray

matrix_columns

The column names of matrix_coverage representing the indices of the upper limit.

Type:: list [float]

matrix_rows

The row names of matrix_coverage representing the indices of the lower limit.

Type:: list [float]

stats_misc.intervals.beta_confidence_interval(proportion: float, total_sample: int, alpha: float = 0.05, integer: bool = True) → IntervalResults[source]

Compute beta confidence intervals for proportions, based on a classic contribution by SA Julious [1]_.

Parameters:

proportion (float) – A proportion between 0 and 1.
total_sample (int) – The total sample size from which the proportion was derived.
alpha (float, default 0.05) – The 100(1-alpha)% confidence level.
integer (bool, default True) – Whether the function should raise an error if the product of proportion * total_sample is not an integer. If set to False, the function will issue a warning and round the product to the nearest integer.

Returns:

A results class returning the lower and upper confidence interval bounds.

Return type:

IntervalResults

References

stats_misc.intervals.linear_contrast(d_mat: ndarray, v_mat: ndarray, c_vec: ndarray, contrast_names: list[str] | None = None, alpha: float = 0.05) → DataFrame[source]

Compute exact variance propagation for linear contrasts of regression coefficients.

For any linear contrast L = D’ * beta, the variance is:

Var(L) = D’ * V * D

This is exact and holds for any regression model with a coefficient vector and variance-covariance matrix.

Parameters:

d_mat (array_like, shape (n_coef, n_contrasts)) – Contrast matrix. Each column defines one contrast, each row corresponds to one coefficient. Set a row to zero to exclude that coefficient from the contrast.
v_mat (array_like, shape (n_coef, n_coef)) – Variance-covariance matrix of the coefficients.
c_vec (array_like, shape (n_coef,)) – Point estimates of the model coefficients.
contrast_names (list [str], default None) – Labels for each contrast. Defaults to [“contrast_1”, “contrast_2”, …].
alpha (float, default 0.05) – The 100(1-alpha)% confidence level.

Returns:

One row per contrast with columns: contrast, estimate, se, lower, upper. All values are on the linear predictor scale; any back-transformation is the caller’s responsibility.

Return type:

pd.DataFrame

Raises:

ValueError – If dimensions of d_mat, v_mat, c_vec are inconsistent, alpha is outside (0, 1), or contrast_names length does not match ncol(d_mat).

Notes

Generality across model types Works for OLS, GLM, Cox, mixed-effects, penalised regression, or any model yielding a coefficient vector and covariance matrix. For spline or polynomial terms, populate the corresponding rows of d_mat with the evaluated basis values at the target point (e.g. from a fitted spline object).

Link scale For models with a nonlinear link (logistic, Poisson, Cox), estimates are returned on the link scale. Back-transform and propagate uncertainty externally as needed.

Interactions and piecewise equations Include the product (or indicator) terms explicitly in d_mat. For example, for treatment effect in females under a treatment * gender interaction, set both the treatment row and the interaction row to 1.

Examples

Linear model: y ~ gender * treatment Coefficients: intercept, gender_female, treatment,

treatment:gender_female

Treatment effect in males: [0, 0, 1, 0] Treatment effect in females: [0, 0, 1, 1]

>>> d_mat = np.array([[0, 0],
...                   [0, 0],
...                   [1, 1],
...                   [0, 1]])
>>> c_vec = np.array([5.0, 1.2, 2.3, 0.8])
>>> v_mat = np.eye(4) * 0.1
>>> linear_contrast(d_mat, v_mat, c_vec,
...                 contrast_names=["males", "females"])

stats_misc.intervals.univariable_poisson_standard_normal(counts: list[int], alpha: float = 0.05) → IntervalResults[source]

Compute Poisson confidence intervals based on the standard normal approximation to the Poisson distribution.

Parameters:

counts (list [int]) – A list of counts.
alpha (float, default 0.05) – The 100(1-alpha)% confidence level.

Returns:

A results object containing the lower and upper confidence bounds, the point estimate, and the interval coverage.

Return type:

IntervalResults

stats_misc.intervals.univariable_quantiles_exact(values: list[float], quantile: float, alpha: float = 0.05, window: int = 2) → QuantilesIntervalResults[source]

Compute an exact confidence interval for a univariate quantile.

Estimates the lower and upper bounds of a 100(1 - alpha)% confidence interval around a specified quantile of the input data. The function returns both the values and indices of the bounds, as well as the exact coverage probability.

Parameters:

values (list [float]) – A list of floats representing the available data.
quantile (float) – The target quantile (between 0 and 1); for example, 0.5 for the median, 0.25 for the first quartile, or 0.025 for the 2.5th percentile.
alpha (float, default 0.05) – The 100(1-alpha)% confidence level.
window (int, default 2) – The number of points allowed around the exact quantile index when searching for valid intervals.

Returns:

An object containing the interval indices, interval bounds, the point estimate, and the exact coverage probability.

Return type:

QuantilesIntervalResults

Notes

This method uses the binomial distribution to derive the exact probability of a value being less than or equal to the desired quantile. It does not make assumptions about the underlying distribution of the data and thus produces nonparametric, exact confidence intervals. The returned interval is guaranteed to have coverage greater than or equal to the nominal level.

The code is based on the stackexchange answer by whuber. Which in turn is based on Chapter 5 of Meeker, Hahn, and Escobar’s book [1]_.

References

stats_misc.intervals.wald_confidence_interval(point: float, se: float, alpha: float = 0.05) → IntervalResults[source]

Compute a 100(1-alpha)% confidence interval using the normal distribution.

Parameters:

point (float) – The point estimate.
se (float) – The standard error of the point estimate.
alpha (float, default 0.05) – The 100(1-alpha)% confidence level.

Returns:

A results object containing the lower and upper confidence bounds, the point estimate, and the interval coverage.

Return type:

IntervalResults

stats_misc.tests

Statistical tests for null-hypothesis significance testing.

This module provides a set of tools for conducting null-hypothesis significance tests, including classical approaches such as the Kolmogorov–Smirnov test, Wald tests, and one-way ANOVA. These methods operate on either raw or summarised data, and return structured results encapsulated in specialised results classes.

Classes

TestResults: A container for general hypothesis test results, including point estimate, standard error, test statistic, p-value, and null value.
AnovaTestResults: A subclass of TestResults tailored for one-way ANOVA output, including degrees of freedom and sums of squares.

Functions

ks_test(data, group, values, nulldistribution=’uniform’): Performs the Kolmogorov–Smirnov test for each group in the data.
wald_interaction_test(point, se, null_value=0.0): Performs a Wald test comparing two estimates with associated standard errors.
anova_one_way(means, variances, sizes): Computes one-way ANOVA using only group-level summary statistics.
correlation_test(correlation, size, null_value=0.0): Performs a Fisher z-test on a correlation coefficient.

class stats_misc.tests.AnovaTestResults(**kwargs)[source]

A container for one-way ANOVA test results.

test_statistic

The F-statistic.

Type:: float

p_value

The p-value associated with the F-statistic.

Type:: float

explained_sum_squares

The sum of squares explained by the model (between-groups sum of squares).

Type:: float

residual_sum_squares

The sum of squares of residuals (within-groups sum of squares).

Type:: float

df_numerator

Degrees of freedom for the numerator (between groups).

Type:: int

df_denominator

Degrees of freedom for the denominator (within groups).

Type:: int

class stats_misc.tests.TestResults(**kwargs)[source]

A container for results of a general statistical test.

point_estimate

The point estimate.

Type:: float

standard_error

The standard error of the point estimate.

Type:: float

test_statistic

The test statistic.

Type:: float

p_value

The p-value associated with the test statistic evaluated against the null hypothesis value.

Type:: float

null_value

The null hypothesis value.

Type:: float

stats_misc.tests.anova_one_way(means: list[int | float], variances: list[int | float], sizes: list[int]) → AnovaTestResults[source]

Perform one-way ANOVA using summary statistics.

Computes the F-statistic and corresponding p-value for a one-way analysis of variance, given group-level means, variances, and sample sizes. Parameters

varianceslist [int | `float]: Sample variances for each group.
sizeslist [int]: Sample sizes of each group.

Returns:: A results object containing the F-statistic, p-value, sum of squares (explained and residual), and degrees of freedom.
Return type:: AnovaTestResults

Notes

This implementation is suitable when only group-level summary statistics are available, in contrast to the standard ANOVA method which requires individual-level observations.

stats_misc.tests.correlation_test(correlation: float, size: int, null_value: float = 0.0) → TestResults[source]

Perform a Fisher z-test on a correlation coefficient.

Uses the Fisher z-transformation, which is asymptotically valid for both zero and non-zero null hypotheses.

Parameters:

correlation (float) – The correlation coefficient.
size (int) – Sample size. Must be >= 5.
null_value (float, default 0.0) – Null hypothesis value for the correlation.

Returns:

A results object containing the test statistic, p-value, point estimate, and null value

Return type:

TestResults

Notes

The Fisher z-statistic is computed as:

\[z = (\text{arctanh}(r) - \text{arctanh}(\rho_0)) \times \sqrt{n - 3}\]

and evaluated against a standard normal distribution.

The results assymptotically valid for reasonably large sample size of 20 or larger. For smaller sample sizes try a permutation test.

stats_misc.tests.ks_test(data: DataFrame, group: str, values: str, nulldistribution: str = 'uniform') → dict[str, Any][source]

Perform the Kolmogorov–Smirnov test across grouped data.

Applies a one-sample Kolmogorov–Smirnov test to subgroups within a DataFrame, assessing whether each group’s distribution differs from a specified null distribution.

Parameters:

data (pd.DataFrame) – A table containing column names group and values refer to.
group (str) – Column name in data used to define group membership.
values (str) – Column name in data containing the values to be tested.
nulldistribution (str, default uniform) – The reference distribution for the KS test. Must be a valid name from scipy.stats.

Returns:

A dictionary mapping each group label to the corresponding KS test result object.

Return type:

dict

stats_misc.tests.wald_interaction_test(point: tuple[float, float] | list[float], se: tuple[float, float] | list[float], null_value: float = 0.0) → TestResults[source]

Perform a Wald test on the difference between two point estimates.

Tests whether the observed difference between two point estimates significantly deviates from a specified null value, using a normal approximation.

Parameters:

point (tuple or list [float, float]) – Two point estimates, for example the mean difference or log odds ratio.
se (tuple or list [float, float]) – The two standard errors of the point estimates.
null_value (float, default 0.0) – The hypothesised difference under the null hypothesis.

Returns:

A results object containing the test statistic, p-value, estimated difference, standard error, and null value

Return type:

TestResults

stats_misc.meta_analysis

Meta-analysis estimators for weighted averages and heterogeneity assessment.

This module provides methods for calculating weighted averages of point estimates using fixed-effect and random-effects models. It includes implementations of standard estimators for between-study variance (tau-squared), Q-tests for heterogeneity, and I-squared statistics. Iterative and non-iterative estimators are supported, including DerSimonian-Laird, Cochrane ANOVA, Paule-Mandel, and method of moments.

Classes

HeterogeneityResults: A results container for heterogeneity estimates including Q-statistic, I-squared, and tau-squared values.

Functions

fixed_effect(estimates, standard_errors): Computes a fixed-effect meta-analytic estimate and standard error.
random_effects(estimates, standard_errors, between_estimate_variance): Computes a random-effects meta-analytic estimate and standard error.
heterogeneity(estimates, standard_errors, overall_estimate, alpha=0.05,: tau2=None)

Computes heterogeneity measures: Q-test, I-squared, and tau-squared.
estimate_tau(estimates, standard_errors, method=’mm-it’, **kwargs): Estimates the between-study variance (tau-squared) using various methods.

Notes

These implementations are partially adapted from statsmodels, with adjustments for flexibility and reproducibility.

class stats_misc.meta_analysis.HeterogeneityResults(**kwargs)[source]

A results container for heterogeneity estimates.

q_statistic

The Q-statistic.

Type:: float

q_pvalue

The Q-statistic p-value.

Type:: float

i_squared

The I-squared heterogeneity statistic.

Type:: float

i_squared_ci

The I-squared confidence interval.

Type:: tuple [float]

i_squared_ci_coverage

The confidence level of the I-squared interval.

Type:: float

tau_squared

The tau-squared heterogeneity statistic.

Type:: float

stats_misc.meta_analysis.estimate_tau(estimates: list[float], standard_errors: list[float], method: Literal['dl', 'ca', 'dl2', 'ca2', 'mm-it', 'pm-it'] = 'mm-it', **kwargs: Any) → float[source]

Calculate the between study/estimate variance, see DerSimonian and Kacker 2007 [1]_.

Parameters:

estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).
standard_errors (list [float]) – Corresponding standard errors for each estimate.
method ({‘dl’, ‘ca’, ‘dl2’, ‘ca2’, ‘mm-it’, ‘pm-it’}, default dl2) –
Which methods of moments (MM) estimator to use:
- dl : DerSimonian and Laird (DL) estimator,
- ca : Cochrane ANOVA (CA) estimator,
- dl2 : The two-stage DL estimator,
- ca2 : The two-stage CA estimator,
- mm-it : The general MM iterative estimator,
- pm-it : The Paule Mandel iterative estimator.
**kwargs (any) – Keyword arguments supplied to _tau_iter_mm or _tau_iter_pm.

Returns:

Estimated tau-squared.

Return type:

float

References

stats_misc.meta_analysis.fixed_effect(estimates: list[float], standard_errors: list[float]) → tuple[float, float][source]

Compute fixed-effect meta-analysis estimate.

Parameters:

estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).
standard_errors (list [float]) – Corresponding standard errors for each estimate.

Returns:

estimate (float) – The average estimate,
standard_error (float) – The standard error of the average estimate.

stats_misc.meta_analysis.heterogeneity(estimates: list[float], standard_errors: list[float], overall_estimate: float | int, alpha: float = 0.05, tau2: float | None = None) → HeterogeneityResults[source]

Provides heterogeneity estimates (Q-test, I-squared, and Tau-square) [2]_, determining too what extent the individual estimates are distinct from the overall_estimate, accounting for difference in precision based on the supplied estimate specific standard_errors.

Parameters:

estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).
standard_errors (list [float]) – Corresponding standard errors for each estimate.
overall_estimate (float or int) – The overall estimate, for example based on a meta-analysis of the point estimates.
alpha (float, default 0.05) – The alpha for the (1-alpha/2)*100% confidence interval.
tau2 (float, optional) – A possible external estimate of the tausquard used in the random effects estimator. If NoneType a non-iterative MM estimate will be used.

Returns:

Object containing heterogeneity statistics.

Return type:

HeterogeneityResults

References

Notes

Heterogeneity will be internally evaluated using a Chi-square distribution with len(estimates) - 1 degrees of freedom.

The heterogeneity estimates are based on [2]_, specifically the tau-squared is estimated using the DerSimonian and Laird method of moments method (without iteration).

stats_misc.meta_analysis.random_effects(estimates: list[float], standard_errors: list[float], between_estimate_variance: float) → tuple[float, float][source]

Compute random-effects meta-analysis estimate.

Parameters:

estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).
standard_errors (list [float]) – Corresponding standard errors for each estimate.
between_estimate_variance (float) – The between estimate variance, often referred to as the tau-squared. Can be estimated using the Heterogeneity function.

Returns:

estimate (float) – The average estimate,
standard_error (float) – The standard error of the average estimate.

Notes

Setting between_estimate_variance to zero will results in a fixed effect estimate.

stats_misc.machine_learning.firthlogist

A module implementing Firth’s logistic regression.

Firth’s regression is a penalised likelihood method that addresses small-sample bias and perfect separation in logistic regression. It adjusts score function to yield more reliable parameter estimates.

Note this is essentially a fork of the GitHub firthlogist repo, with minor tweaks to work on python 3.10+.

class stats_misc.machine_learning.firthlogist.FirthLogisticRegression(max_iter: int = 25, max_halfstep: int = 0, max_stepsize: int = 5, pl_max_iter: int = 100, pl_max_halfstep: int = 0, pl_max_stepsize: int = 5, tol: float = 0.0001, fit_intercept: bool = True, skip_pvals: bool = False, skip_ci: bool = False, alpha: float = 0.05, wald: bool = False, test_vars: int | list[int] | None = None)[source]

Logistic regression with Firth’s bias reduction method.

This is based on the implementation in the logistf R package. Please see the logistf references [1]_ and [2]_ for details about the method.

Parameters:

max_iter (int, default 25) – The maximum number of Newton-Raphson iterations.
max_halfstep (int, default 0) – The maximum number of step-halvings in one Newton-Raphson iteration.
max_stepsize (int, default 5) – The maximum step size - for each coefficient, the step size is forced to be less than max_stepsize.
pl_max_iter (int, default 100) – The maximum number of Newton-Raphson iterations for finding profile likelihood confidence intervals.
pl_max_halfstep (int, default 0) – The maximum number of step-halvings in one iteration for finding profile likelihood confidence intervals.
pl_max_stepsize (int, default 5) – The maximum step size while finding PL confidence intervals.
tol (float, default 0.0001) – Convergence tolerance for stopping.
fit_intercept (bool, default True) – Specifies if intercept should be added.
skip_pvals (bool, default False) – If True, p-values will not be calculated. Calculating the p-values can be time-consuming if wald=False since the fitting procedure is repeated for each coefficient.
skip_ci (bool, default False) – If True, confidence intervals will not be calculated. Calculating the confidence intervals via profile likelihoood is time-consuming.
alpha (float, default 0.05) – Significance level (confidence interval = 1-alpha).
wald (bool, default False) – If True, uses Wald method to calculate p-values and confidence intervals.
test_vars (int, list [int], or None, default None) – Index or list of indices of the variables for which to calculate confidence intervals and p-values. If None, calculate for all variables. This option has no effect if wald=True.

bse_: Standard errors of the coefficients.

classes_: A list of the class labels.

ci_: The fitted profile likelihood confidence intervals.

coef_: The coefficients of the features.

intercept_: Fitted intercept. If fit_intercept = False, the intercept is set to zero.

loglik_: Fitted penalized log-likelihood.

llf_: A statsmodels compliant allies for loglik_

n_iter_: Number of Newton-Raphson iterations performed.

pvals_: p-values calculated by penalized likelihood ratio tests.

References

Biometrika 80, 27–38.

in logistic regression. Statistics in Medicine 21: 2409-2419.

property bse: ndarray: The standard errors of the regression coefficients.

decision_function(X: ndarray) → ndarray[source]

Returns the linear predictors (including the intercept).

Parameters:: X (np.ndarray) – A user supplied design matrix, should have the same number of variables used to derive the fitted model.
Returns:: Linear predictors on the logit(risk) scale.
Return type:: np.ndarray

Notes

The returned value is on the logit(risk) scale.

fit(X: ndarray, y: ndarray) → Self[source]

Fits a Firth logistic regression model.

Parameters:

X (np.ndarray) – A 2d numpy array with dimension n (subjects) by k (variables). The array represents the design matrix of independent variables.
y (np.ndarray) – A 1d or 2d numpy array with dimension n by 1. The array represents the dependent variable, typically coded as 0 and 1.

Returns:

The fitted model instance.

Return type:

FirthLogisticRegression

property llf: float: The loglikelihood, statsmodels alias of loglik_.

property loglik: float: The loglikelihood.

property params: ndarray: The regression coefficients.

predict(X: ndarray) → ndarray[source]

Returns predicted values based on a design matrix.

Parameters:: X (np.ndarray) – A user supplied design matrix, should have the same number of variables used to derive the fitted model.
Returns:: Binary predicted class labels (0 or 1).
Return type:: np.ndarray

Notes

Returns 0 or 1 depending on whether the logit(risk) is larger than zero.

predict_proba(X: ndarray) → ndarray[source]

Returns the predicted probability.

Parameters:: X (np.ndarray) – A user supplied design matrix, should have the same number of variables used to derive the fitted model.
Returns:: Predicted probabilities, shape (n_samples, 2).
Return type:: np.ndarray

property pvalues: ndarray: The p-values of the regression coefficients.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → FirthLogisticRegression

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

summary(xname: list[str] | None = None, tablefmt: str = 'simple') → None[source]

Prints a summary table.

Parameters:

xname (list [str] or None, default NoneType) – Names for the X variables. Default is x1, x2, … Must match the number of parameters in the model.
tablefmt (str) – tabulate table format for output. Please see the documentation for tabulate for options.

stats_misc.machine_learning.sklearn_utils

A module with helper functions to support machine learning using sklearn.

stats_misc.machine_learning.sklearn_utils.report(results: dict[str, Any], n_top: int = 3, verbose: bool = True, sort: str | None = None) → DataFrame[source]

Prints and extracts cross-validation results from a sklearn based hyper-parameter search. Copied from here.

Parameters:

results (dict) – A dictionary with grid search results, e.g., from a sklearn.model_selection._search.GridSearchCV.cv_results_ object. The dictionary should contain the keys rank_test_score, mean_test_score, std_test_score, and params.
n_top (int, default 3) – The top n results that should be printed - based on the rank_test_score column of results, so it automatically adjusts for the criterion (scores) that are looking for a minimum, or maximum. Will be ignored if verbose is False.
sort (str or None, default None) – Set to ascending or descending, which will internally be parsed as a boolean argument to ascending in pandas.DataFrame.sort_values.
verbose (bool, default True) – If something should be printed.

Returns:

A ranked table with information on the used hyper-parameters, the run time and the performance. For the exact meaning please refer to the documentation of the supplied results object.

Return type:

pd.DataFrame

stats_misc.machine_learning.sklearn_utils.tune_hyper_params(learner: Callable, searcher: Callable, X: ndarray, Y: ndarray, fixed_learner_dict: dict[str, Any] | None = None, searcher_params_dict: dict[str, Any] | None = None, fit_dict: dict[str, Any] | None = None, strata: ndarray | None = None, strata_ascending: bool = False) → tuple[DataFrame, SklearnClass][source]

Will take a supervised learning algorithm learner and tune the algorithm’s hyper-parameters using a searcher.

Common sklearn searching algorithms include:

GridSearchCV, to perform an exhaustive search,
RandomizedSearchCV, which samples candidate parameters from supplied
statistical distributions.

Have a look at the sklearn documentation for further options and discussion.

When strata is supplied, the searcher will be applied to each individual stratum value. Per hyper-parameter set the perofrmance metric will be averged, and internally the best set (with rank 1) will be selected and returned as a searcher object with a single (best) model.

Parameters:

learner (callable) – An sklearning supervised algorithm, such as NuSVC or RandomForestClassifier.
searcher (callable) – A searching algorithm to tune the hyper-parameters of learner
X (np.ndarray) – The design matrix containing all the features as columns and the observations/subjects as rows.
Y (np.ndarray) – A matrix with the dependent variable(s), i.e., the labelled data.
fixed_learner_dict (dict or None, default None) – Optional keyword arguments for the learner, specifying parameters which do not need to be tuned.
searcher_params_dict (dict or None, default None) – Optional keyword arguments for the searcher algorithm.
fit_dict (dict or None, default None) – Optional keyword arguments for the searcher.fit callable.
strata (np.ndarray or None, default None) – An optional r by 1 matrix with discrete values to stratify the tuning algorithm on. Will perform separate tuning on each stratum and return the hyper-parameter set with the average (mean) optimised performance for the learner. Will be ignored if set to None.
strata_ascending (bool, default False) – Wether the largest or smallest optimisation metric should be used to identify the best model. For example, for c-statistic one would want the largest value, whereas for the mean squared error one would prefer the hyper-parameters with the smallest value.

Returns:

pd.DataFrame – A table of search results.
SklearnClass – The searcher object with either a single best model (when strata is supplied), or alternatively all the models evaluated by the searcher.

stats_misc.machine_learning.sksurv_utils

A collection of utils for scikit-survival. Currently, the module focussed on downstream extracton of predictions and outcomes, as well as on tools to help with model validation.

The code can likely be generalised further to work with non-sksurv models as well.

class stats_misc.machine_learning.sksurv_utils.SurvivalModelBrierScore(model: Callable, times: ndarray)[source]

Evaluates a sklearn-survival model using the integrated Brier score, and computes the baseline (non-informative) Brier score based on the event incidence in supplied test data.

Parameters:

model (Callable) – A fitted sklearn-survival model that implements a predict_survival_function method returning survival functions for individuals.
times (np.ndarray) – A sequence of time points over which the integrated Brier score is computed.

model

The survival model used for prediction including the method predict_survival_function

Type:: Callable

times

An array of time points used to evaluate the Brier score.

Type:: np.ndarray

baseline_brier_score

The Brier score of a non-informative model that always predicts the event incidence (π(1 - π)), calculated from the test data.

Type:: float or NoneType

ibs

The integrated Brier score of the model on the test data.

Type:: float or NoneType

bss

The Brier skills score, representing thr eation of the ibs and the baseline_brier_score.

Type:: float or NoneType

evaluate(Y_train, Y_test, X_test): Computes the integrated Brier score and baseline Brier score based on predictions over the specified time grid.

get_brier_skill_score()[source]: Returns the Brier skill score (BSS), which quantifies improvement over the baseline Brier score.

get_brier_skill_score() → float[source]

Computes the Brier Skill Score (BSS), which represents the improvement of the model over the baseline (non-informative) predictor.

Raises:: AttributeError – Raises an error if the __call__ method has not run.
Returns:: The Brier Skill Score
Return type:: float

class stats_misc.machine_learning.sksurv_utils.SurvivalModelCalibration(model: Callable, n_groups: int = 5, nonparametric_estimator: Callable = <function kaplan_meier_estimator>)[source]

Creates groups based on the predicted survival and compared the predicted event rate to the non-parametric event rate.

Parameters:

model (Callable) – A fitted scikit-survival model with a predict_survival_function method.
n_groups (int, default 5) – Number of equally sized participant groups, created based on the predicted survival.
nonparametric_estimator (Callable, default KaplanMeierEstimator) – A nonparametric estimator class from sksurv.nonparametric, e.g. KaplanMeierEstimator or CumulativeIncidenceEstimator.

group_summary_

summary statistics for each group: predicted survival and observed estimate.

Type:: dict [int, dict]

stats_misc.machine_learning.sksurv_utils.event_by_time(Y: ndarray, times: list[float], event_name: str = 'event', time_name: str = 'time') → DataFrame[source]

Determines whether an event has occurred by specified time points.

Parameters:

Y (np.ndarray) – Structured array with fields (‘event’, ‘time’), typically created using sksurv.util.Surv.from_arrays.
times (Sequence[float]) – Time points at which to assess event occurrence.
event_name (str, default event) – Name for the event column in y.
time_name (str, default time) – Name for the time column in y.

Returns:

A DataFrame of shape (n_samples, len(times)) with boolean values indicating whether the event had occurred by each time point.

Return type:

pd.DataFrame

stats_misc.machine_learning.sksurv_utils.surv_model_auc(model: Callable | None, times: list[float], data: tuple[str, ndarray, ndarray], train_y: ndarray) → DataFrame[source]

Computes time-dependent AUCs for survival models using cumulative dynamic AUC.

Parameters:

model (callable or NoneType) – A fitted survival model with a predict(X) method. If None, assumes d_tup[1] already contains predicted risk scores.
times (list [float]) – Time points at which to evaluate the time-dependent AUC.
data (tuple [str, np.ndarray, np.ndarray]) – The tuple should contain (label, test_y, test_x or predicted_risk), where test_y is a structured array of survival data (e.g., from sksurv.util.Surv.from_arrays), and label is a descriptor of the data split (e.g., ‘test’, ‘validation’).
train_y (np.ndarray) – Structured array of survival outcomes for training data, used to construct the risk sets.

Returns:

A long-format DataFrame with columns: - “Data split”: label of the data subset - “Time”: time points evaluated - “AUC by time”: corresponding AUC values - “Mean AUC”: mean of AUCs across time points (NaNs ignored)

Return type:

pd.DataFrame

Notes

This function evaluates the performance of a survival model across multiple time points using the cumulative/dynamic AUC approach (as implemented in cumulative_dynamic_auc from sksurv.metrics). It handles multiple data splits or datasets and aggregates the results into a tidy DataFrame.

Examples

>>> from sksurv.util import Surv
>>> import numpy as np
>>> train_y = Surv.from_arrays([True, False], [10, 20])
>>> test_y = Surv.from_arrays([True, False], [5, 25])
>>> pred = np.array([0.1, 0.4])
>>> data = ("test", test_y, pred)
>>> times = [5, 10, 15]
>>> surv_model_auc(None, times, data, train_y)

stats_misc.machine_learning.sksurv_utils.surv_model_calibration_table(model: Callable, X: ndarray, Y: ndarray, times: list[float]) → DataFrame[source]

Creates a table of predicted event probability and actual observed outcomes by time.

Parameters:

model (callable) – A fitted scikit-survival model supporting predict_survival_function(X).
X (pd.DataFrame) – Feature data for prediction.
Y (np.ndarray) – Structured array with fields (‘event’, ‘time’), typically created using sksurv.util.Surv.from_arrays.
times (list [float]) – Time points at which to compute risks and determine event status.

Returns:

A DataFrame combining predicted risk probabilities and binary indicators of event occurrence by each time point.

Return type:

pd.DataFrame

stats_misc.machine_learning.sksurv_utils.surv_model_predict(model: Callable, X: ndarray, times: list[float]) → DataFrame[source]

Predicted event probabilities at specific follow-up times using a scikit-survival model.

Parameters:

model (BaseSurvivalModel) – A fitted scikit-survival model (e.g., CoxPHSurvivalAnalysis, RandomSurvivalForest).
X (np.ndarray) – Feature data for prediction.
times (list [float]) – Time points at which to estimate risk, in the same units used for training the model.

Returns:

A DataFrame with shape (n_samples, len(times)), where each column corresponds to the predicted risk (1 - survival probability) at a given time.

Return type:

pd.DataFrame

stats_misc.machine_learning.validation

Tools to evaluate prediction models at a fixed time-point.

This module provides functions to assess the performance of binary classifiers, with a focus on both discrimination (e.g. c-statistics) and calibration (e.g. calibration slope and intercept). Recalibration tools for risk prediction and agreement metrics for continuous outcomes are also included.

Classes

CstatisticResults: A results container for c-statistic estimates and confidence intervals.
CalibrationResults: A results class for calibration metrics, including calibration slope, intercept, and optionally a grouped observed vs predicted table.
RecalibrationResults: A results class for recalibrated slope and intercept estimates, and a table of updated predictions.
ConcordanceCorrelationCoefficient: A class to compute Lin’s concordance correlation coefficient (CCC), with options for delta-method and Fisher-based confidence intervals.

Functions

cstatistic(observed, predicted, alpha=None): Computes Harrell’s c-statistic and optionally its confidence interval.
format_roc(observed, predicted, **kwargs): Formats ROC curve metrics as a table with sensitivity and false positive rates at various thresholds.
get_calibration(data, observed, predicted, alpha=0.05, bins=None,: kwargs_ci=None) Computes the calibration slope and intercept, and optionally a grouped observed vs predicted risk table with confidence intervals.
recalibrate(data, score, observed, model=’binomial’): Recalibrates a risk score using a binomial or Gaussian model.

class stats_misc.machine_learning.validation.CalibrationResults(**kwargs: Any)[source]

A results objects containing calibration estimates.

calibration_slope

The calibration slope.

Type:: float

calibration_slope_se

The standard error of the calibration slope.

Type:: float

calibration_intercept

Calibration-in-the-large estimate.

Type:: float

calibration_intercept_se

The standard error of the calibration-in-the-large.

Type:: float

observed_predict_table

an optional table with observed and predicted risks.

Type:: pd.DataFrame

class stats_misc.machine_learning.validation.ConcordanceCorrelationCoefficient(x: ndarray, y: ndarray)[source]

Compute Lin’s Concordance Correlation Coefficient (CCC) between two continuous variables.

Parameters:

x (np.ndarray) – First set of measurements.
y (np.ndarray) – Second set of measurements.

ccc

Concordance correlation coefficient.

Type:: float

ccc_ci

Type:: tuple [float, float]

pearson_correlation

Pearson correlation coefficient between x and y.

Type:: float

bias_correction

Bias correction factor accounting for scale and location mismatch.

Type:: float

scale_factor

The vertical scaling constant (σ_X / σ_Y).

Type:: float

translation_constant

The vertical translation constant ((μ_X - μ_Y) / sqrt(σ_X σ_Y)).

Type:: float

Notes

The CCC is defined as:

CCC = 2ρ σ_X σ_Y / (σ_X² + σ_Y² + (μ_X - μ_Y)²)

where ρ is the Pearson correlation, σ_X and σ_Y are the standard deviations of x and y, and μ_X, μ_Y their respective means.

The CCC combines measures of precision and accuracy to assess agreement between paired measurements. It is often used in method comparison studies, biometrics, and reliability analysis.

It penalises for both lack of correlation and systematic bias (e.g. shifts in location or scale), in contrast to the Pearson correlation which captures only linear association.

The CCC is conceptually related to the Intraclass Correlation Coefficient (ICC). Both are measures of agreement, but the CCC is specifically designed for paired continuous data and includes a built-in bias penalty. The ICC is more general and often used in the context of multiple raters or repeated measurements under different conditions.

References

class stats_misc.machine_learning.validation.CstatisticResults(**kwargs: Any)[source]

A results objects containing c-statistic estimates.

c_statistic

The c-statistic.

Type:: float

lower_bound

The lower bound of the confidence interval.

Type:: float

upper_bound

The upper bound of the confidence interval.

Type:: float

confidence_interval

The lower and upper bounds of the confidence interval.

Type:: tuple [float, float]

coverage

The interval coverage.

Type:: float

standard_error

The standard error of the c-statistic, based on an asymptotic approximation to the normal distribution.

Type:: float

class stats_misc.machine_learning.validation.RecalibrationResults(**kwargs: Any)[source]

A results objects containing recalibration estimates.

slope

The slope estimate

Type:: float

intercept

The intercept estimate

Type:: float

table_recalibrated

A table with the recalibrated estimates appended.

Type:: pd.DataFrame

stats_misc.machine_learning.validation.cstatistic(observed: Series | ndarray, predicted: Series | ndarray, alpha: float | None = None) → CstatisticResults[source]

Compute Harrell’s concordance statistic (c-statistic) for a model predicting a binary outcome.

Parameters:

observed (pd.Series or np.ndarray) – a column vector with the observed class.
predicted (pd.Series or np.ndarray) – a column vector with the predicted class or log/predicted probability.
alpha (float or None, default None) – the (1-alpha)% confidence interval. Default None for no confidence interval.

Returns:

Harrel’s c-statistics with optional confidence interval.

Return type:

CstatisticResults

Notes

This function estimates the probability that, for a randomly chosen pair of individuals (one with and one without the event), the predicted score is higher for the individual with the event. Optionally, a confidence interval can be calculated based on an asymptotic approximation.

Predictions (predicted column) can range between +- infinity, allowing for evaluations of non-Prob scores.

The standard error are derived based on _[1] Hanley and McNeil.

References

receiver operating characteristic (ROC) curve, Radiology 1982; 29-36.” see (Table 2).

stats_misc.machine_learning.validation.format_roc(observed: Series | ndarray, predicted: Series | ndarray, **kwargs: Any) → DataFrame[source]

Compute the ROC curve and return it as a tidy DataFrame.

Parameters:

observed (np.ndarray) – A column vector of the observed binary outcomes.
predicted (np.ndarray) – A column vector of the predicted outcome, e.g., representing the predicted probability.
**kwargs (Any) – Additional keyword arguments passed to sklearn.metrics.roc_curve.

Returns:

A table with the following columns: - false_positive: False positive rate at each threshold. - sensitivity: True positive rate (sensitivity). - threshold: Score thresholds used to compute the curve.

Return type:

pd.DataFrame

Notes

This function wraps sklearn.metrics.roc_curve and converts the result into a structured DataFrame for convenient plotting or inspection

stats_misc.machine_learning.validation.get_calibration(data: DataFrame, observed: str, predicted: str, alpha: float = 0.05, bins: int | None = None, kwargs_ci: dict | None = None) → CalibrationResults[source]

Estimate the calibration slope and calibration-in-the-large assuming a binomial data generating model. A binomial model is generally appropriate if the predicted risk reflects an event occurring at a fixed moment in time, e.g. after 1 year or 1 hour.

Parameters:

data (pd.DataFrames) – A table with the columns observed and predicted.
observed (str) – a column name in data referencing the binary outcome column.
predicted (str) – a column name in data referencing the logit predicted risk.
bins (int, optional, default NoneType) – an optional integer used to create equally sized bins of the predicted logit risk and returns the average predicted and observed risk in each bin (an observed vs expected table)
alpha (float, default 0.05) – the (1-alpha)% confidence interval. Used for the observed risk when bin is supplied.
kwargs_ci (dict [str, any] or None, default None.) – Any keyword arguments for beta_confidence_interval. For example, to supply integer = False.

Returns:

A calibration intercept and slope estimates, and an optional observed and expected table.

Return type:

CalibrationResults

Notes

The model assumes the predicted value is on the log-odds scale (logit). If your model outputs probabilities, use the logit transform beforehand.

The calibration intercept is computed using a GLM with an offset, and the slope via standard logistic regression.

The grouped calibration table is based on quantile binning of predicted risks.

The function DOES not presuppose the predicted risk is derived from a logistic (binomial) regression model. ANY model predicting risk at a fixed moment in time is acceptable, including models that typically provide discrete predictions such a classification trees. Some/many models or rules may only provide predicted risk not the logit risk, in such cases simply call the logit function to derive the appropriate variable.

stats_misc.machine_learning.validation.recalibrate(data: DataFrame, score: str, observed: str, model: str = 'binomial') → RecalibrationResults[source]

The original score will be re-trained on observed outcome data, improving the agreement between the observed and predicted risk/outcome.

Parameters:

data (pd.DataFrame) – A table with columns score and observed.
observed (str) – a column name in data
score (str) – a column name in data. Note that contrary to the calibration function the score can be in any format depending on intended use. For binomial model one would typically want to supply a logit risk.
model (str, default binomial) – which model should be used (default: ‘binomial’, or ‘gaussian’)

Returns:

The recalibration intercept and slope, as well as a table with the recalibrated predictions.

Return type:

RecalibrationResults

stats_misc.resampling

Resampling methods.

This module provides tools for non-parametric resampling procedures, including bootstrap and jackknife methods. It implements several widely used approaches for confidence interval estimation, such as percentile intervals, basic bootstrap, bias-corrected and accelerated (BCa), and studentised intervals.

These methods operate on raw data supplied as tuples of NumPy arrays, and are designed to work with user-defined statistical functions. Results are returned as structured objects that encapsulate interval estimates, coverage probabilities, and resampling metadata.

Classes

BootCIResults: A container for standard bootstrap confidence interval output, including interval bounds, coverage, and bootstrap samples.
BCaCIResults: A subclass of BootCIResults including additional information for BCa intervals, such as acceleration and bias correction terms.
BaseBootstrap: A general base class for generating bootstrap replicates and original (non-resampled) estimates.
PercentileCI: Implements the percentile bootstrap confidence interval.
BasicCI: Implements the basic (reverse) bootstrap confidence interval.
StudentisedCI: Implements the studentised bootstrap confidence interval using nested resampling to estimate variability of the bootstrap estimates.
BCaCI: A callable class that computes BCa confidence intervals and stores the corresponding bootstrap and jackknife samples.

Functions

bootstrap_replicates(data, statsfunction, n_estimates, n_reps=999, **kwargs): Generates bootstrap samples and computes statistics on each.
jackknife_replicates(data, statsfunction, n_estimates, **kwargs): Computes jackknife replicates by systematically omitting each observation.

Notes

All core resampling functions have been optimised using Numba for performance. Input arrays must have consistent row dimensions. Output is typically returned as NumPy arrays or results class instances that support structured inspection and downstream usage.

class stats_misc.resampling.BCaCI(data: tuple[ndarray[tuple[Any, ...], dtype[Any]], ...])[source]

A Bias-Corrected and Accelerated (BCa) bootstrap confidence interval.

Parameters:: data (tuple [np.ndarray]) – A tuple of numpy arrays with the same number of rows.

data

A tuple of numpy arrays with the same number of rows.

Type:: tuple [np.ndarray]

boot_sample

Bootstrap estimates of statsfunction. None until __call__ is invoked.

Type:: np.ndarray or None

jack_sample

Jackknife estimates of statsfunction. None until __call__ is invoked.

Type:: np.ndarray or None

Notes

The BCa interval generally has a better coverage (i.e. the smallest confidence interval while retaining advertised coverage) than most alternative boostrap confidence interval methods.

class stats_misc.resampling.BCaCIResults(**kwargs)[source]

The results class for Bias-Corrected and Accelerated (BCa) intervals.

interval_indices

The indices corresponding to the lower and upper CI bounds.

Type:: list [int] or np.ndarray

interval_values

The computed lower and upper confidence interval bounds.

Type:: list [float] or np.ndarray

coverage

The confidence interval coverage.

Type:: float

class stats_misc.resampling.BaseBootstrap(data: tuple[ndarray[tuple[Any, ...], dtype[Any]], ...], verbose: bool = True)[source]

Base class for generating bootstrap estimates.

Parameters:

data (tuple [np.ndarray, …]) – A tuple containing one or more numpy arrays representing the input data to be used for bootstrapping.
verbose (bool, default True) – Whether warnings should be returned.

_data

The input data arrays from which bootstrap samples are drawn.

Type:: tuple [np.ndarray, …]

bootstrap_samples(n_estimates, statsfunction, n_reps, \*\*kwargs)[source]: Generate bootstrap replicates and return the resulting statistics.

original_estimate(statsfunction, \*\*kwargs)[source]: Compute the statistic(s) on the original (non-resampled) data.

Notes

This class handles the generation of bootstrap replicates and calculation of original estimates from the provided dataset using a specified statistical function.

bootstrap_samples(n_estimates: int, statsfunction: Callable, n_reps: int = 999, **kwargs: Any) → ndarray[tuple[Any, ...], dtype[Any]][source]

Generates a bootstrap sample, applies a function, and removes missing values.

Parameters:

n_estimates (int) – The number of estimates (statistics) to compute for each bootstrap sample.
statsfunction (Callable) – The statistical function to apply to each bootstrap sample.
n_reps (int, default 999) – The number of bootstrap repetitions (default is 999).
**kwargs (dict [str, any]) – Additional keyword arguments passed to statsfunction.

Returns:

An array of bootstrap estimates, with any samples containing missing values removed.

Return type:

np.ndarray

original_estimate(statsfunction: Callable, **kwargs: Any) → ndarray[source]

Calculate the estimate based on the non-resampled data.

Parameters:

statsfunction (Callable) – The statistical function which will be applied to data.
**kwargs (dict [str, any]) – Additional keyword arguments passed to statsfunction.

Returns:

An array containing the original statistic(s).

Return type:

np.ndarray

class stats_misc.resampling.BasicCI(data: tuple[ndarray[tuple[Any, ...], dtype[Any]], ...], verbose: bool = True)[source]

Basic bootstrap confidence interval estimator.

_data

The input data arrays from which bootstrap samples are drawn.

Type:: tuple [np.ndarray, …]

Notes

The basic bootstrap confidence interval is calculated by reflecting the standard percentile intervals around the original estimate.

class stats_misc.resampling.BootCIResults(**kwargs)[source]

The results class for bootstrapped confidence intervals.

interval_indices

The indices corresponding to the lower and upper CI bounds.

Type:: list [int] or np.ndarray

interval_values

The computed lower and upper confidence interval bounds.

Type:: list [float] or np.ndarray

coverage

The confidence interval coverage.

Type:: float

class stats_misc.resampling.PercentileCI(data: tuple[ndarray[tuple[Any, ...], dtype[Any]], ...], verbose: bool = True)[source]

Percentile bootstrap confidence interval estimator.

_data

The input data arrays from which bootstrap samples are drawn.

Type:: tuple [np.ndarray, …]

class stats_misc.resampling.StudentisedCI(data: tuple[ndarray[tuple[Any, ...], dtype[Any]], ...], verbose: bool = True)[source]

Studentised bootstrap confidence interval estimator.

_data

The input data arrays from which bootstrap samples are drawn.

Type:: tuple of NDArray

Notes

This class performs a nested bootstrap to estimate the standard error of the bootstrapped statistic(s)

stats_misc.resampling.bootstrap_replicates(data: tuple[ndarray[tuple[Any, ...], dtype[Any]], ...], statsfunction: Callable, n_estimates: int, n_reps: int = 999, **kwargs: Any) → ndarray[tuple[Any, ...], dtype[Any]][source]

Draws n_reps bootstrap samples and applies statsfunction on each bth sample, return the results as an numpy array.

Parameters:

data (tuple [np.ndarray]) – A tuple of numpy arrays with the same number of rows.
statsfunction (Callable) – A function which can unpack a tuple of arrays and perform analyses on these: statsfunction(*data). The function can return the estimate(s) as a single float/int, list, tuple or numpy array.
n_estimates (int) – The number of estimates statsfunction will return.
n_reps (int, default 999) – The number of bootstrap samples.
**kwargs (any) – Any keyword arguments supplied to statsfunctions.

Returns:

A 2d numpy array with dims equal to n_reps by n_estimates.

Return type:

np.ndarray

Notes

The helper functions have been optimised through numba.njit.

Will internally map 1d arrays to 2d to deal with numba requirements.

Examples

>>> import numpy as np
>>> np.random.seed(42)
>>>
>>> # Example statsfunction that returns a list of statistics
>>> def stats_list(*data):
...     return [np.mean(data[0]), np.std(data[0])]
>>>
>>> result = bootstrap_replicates(
...     (np.random.randn(10), np.random.randn(10)),
...     stats_list, n_estimates=2, n_reps=10
... )
>>> print(result)

stats_misc.resampling.jackknife_replicates(data: tuple[ndarray[tuple[Any, ...], dtype[Any]], ...], statsfunction: Callable, n_estimates: int, **kwargs: Any) → ndarray[tuple[Any, ...], dtype[Any]][source]

Performs jackknife resampling procedure which systematically leaves out one observation at a time and applies statsfunction.

Parameters:

data (tuple [np.ndarray]) – A tuple of numpy arrays with the same number of rows.
statsfunction (Callable) – A function which can unpack a tuple of arrays and perform analyses on these: statsfunction(*data). The function can return the estimate(s) as a single float/int, list, tuple or numpy array.
n_estimates (int) – The number of estimates statsfunction will return.
**kwargs (any) – Any keyword arguments supplied to statsfunctions.

Returns:

A 2d numpy array with dims equal to data[0].shape[0] by n_estimates.

Return type:

np.ndarray

Notes

The helper functions have been optimised through numba.njit.

Will internally map 1d arrays to 2d to deal with numba requirements.