stats_misc.intervals

Includes a mixture of functions to estimates intervals from a collection of data. Type of intervals include confidence intervals, as well as prediction intervals.

class stats_misc.intervals.IntervalResults(**kwargs)[source]

A results objects containing interval estimates.

interval_values
Type:

tuple [float] The lower and upper bounds of the interval.

point_estimate

The estimate with the highest data-support.

Type:

float

standard_errors

The estimated population variability of the point estimate.

Type:

float

coverage

The interval coverage.

Type:

float

class stats_misc.intervals.QuantilesIntervalResults(**kwargs)[source]

The results class for univariable_quantiles_exact.

interval_indices

The list indices of the interval lower and upper bounds.

Type:

list [int]

interval_values

The lower and upper bounds of the interval.

Type:

list [float]

point_estimate

The estimate with the highest data-support.

Type:

float

coverage

The interval coverage.

Type:

float

matrix_coverage

A 2d array of coverage probabilities.

Type:

numpy.array

matrix_columns

The column names of matrix_coverage representing the indices of the upper limit.

Type:

list [float]

matrix_rows

The row names of matrix_coverage representing the indices of the lower limit.

Type:

list [float]

stats_misc.intervals.beta_confidence_interval(proportion: float, total_sample: int, alpha: float = 0.05, integer: bool = True) IntervalResults[source]

Better confidence intervals for proportions, based on a classic contribution by SA Julious [1]_.

Parameters:
  • proportion (float) – A proportion between 0 and 1.

  • total_sample (int) – The total sample size the proportion was derived from.

  • alpha (float, default 0.05) – A float between 0 and 1, representing the type 1 error rate. Is used to define the confidence interval coverage: (1-alpha)*100.

  • integer (bool, default True) – whether the function should fail when the proportion times total_sample does not result in an integer number of events. Set to False to ignore the ValueError, which will replace the produt with its closest integer value.

Returns:

results – A results class returning the lower and upper confidence interval bounds.

Return type:

IntervalResults

References

proportion: comparison of seven methods by Robert G. Newcombe, Statistics in Medicine 1998; 17:857-872

stats_misc.intervals.univariable_poisson_standard_normal(counts: List[int], alpha: float = 0.05) IntervalResults[source]

Calculates Poisson confidence intervals based on the standard normal approximation to the Poisson distribution.

Parameters:
  • counts (list [int]) – A list of counts.

  • alpha (float) – The (1-alpha/2)% confidence interval coverage.

Returns:

results – A results class returning the lower and upper confidence interval bounds.

Return type:

IntervalResults

stats_misc.intervals.univariable_quantiles_exact(values: List[float], quantile: float, alpha: float = 0.05, window: int = 2) QuantilesIntervalResults[source]

Estimates the locations of the 100(1-alpha)% lower bound and upper bound for any desired quantile, and will return the values of these lower and upper bounds.

Parameters:
  • values (list [float]) – A list of floats representing the available data.

  • quantile (float) – A float between 0 and 1, e.g., use 0.5 for the median, 0.25 for quartile 1, or 0.025 for the 2.5% percentile.

  • alpha (float, default 0.05) – The 100(1-alpha)% confidence level.

Returns:

results – An interval class results object, including the interval indices, bounds, and exact coverage.

Return type:

QuantilesIntervalResults

Notes

The algorithm will attempt to find a near symmetric confidence interval, which is guaranteed to have a coverage equal or larger than requested.

Internally this uses the binomial function to calculate the probability of a value being smaller or equal than the requested quantile, hence resulting in exact confidence interval limits, irrespective of the underlying distribution. These intervals will typically be larger than intervals based on (semi)-parametric solutions.

The codes is based on the stackexchange answer by whuber. Which in turn is based on Chapter five of this book by Meeker, Hahn, and Escobar.

stats_misc.intervals.wald_confidence_interval(point: float, se: float, alpha: float = 0.05) IntervalResults[source]

Calculates a (1-a/2)*100 confidence interval based on the quantile from a normal distribution.

Parameters:
  • point (float) – The point estimate

  • se (float) – The standard error of the point estimate

  • alpha (float) – The (1-alpha/2)% confidence interval coverage.

Returns:

results – A results class returning the lower and upper confidence interval bounds.

Return type:

IntervalResults

stats_misc.meta_analysis

A module to calculated weighted averages of point estimates, using fixed effect or random effects methods. Includes various estimates of heterogeneity.

class stats_misc.meta_analysis.HeterogeneityResults(**kwargs)[source]

A results objects for the heterogeneity function.

q_statistic

The Q-statistic.

Type:

float

q_pvalue

The Q-statistic p-value.

Type:

float

i_squared

The I-squared heterogeneity statistic.

Type:

float

i_squared_ci

The I-squared confidence interval.

Type:

tuple [float]

i_squared_ci_coverage

The coverage of the I-squared confidence interval.

Type:

float

tau_squared

The Tau-squared heterogeneity statistic.

Type:

float

stats_misc.meta_analysis.estimate_tau(estimates: List[float], standard_errors: List[float], method: str = 'mm-it', **kwargs: Any | None) float[source]

Calculate the between study/estimate variance.

Parameters:
  • estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).

  • standard_errors (list [float]) – A list of standard errors of equal length as estimates.

  • method (str, default dl2) –

    Which methods of moments (MM) estimator to use:

    • dl : DerSimonian and Laird (DL) estimator,

    • ca : Cochrane ANOVA (CA) estimator,

    • dl2 : The two-stage DL estimator,

    • ca2 : The two-stage CA estimator,

    • mm-it : The general MM iterative estimator,

    • pm-it : The Paule Mandel iterative estimator.

  • **kwargs (Optional) – keyword argument supplied to _tau_iter_mm or _tau_iter_pm .

Returns:

tausquared – The tau-sqaured estimate.

Return type:

float

References

stats_misc.meta_analysis.fixed_effect(estimates: List[float], standard_errors: List[float]) Tuple[float, float][source]

Calculated an average, with the contribution of each element in estimates weighted by the inverse of the squared standard error of the estimate.

Parameters:
  • estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).

  • standard_errors (list [float]) – A list of standard errors of equal length as estimates.

Returns:

  • estimate (float) – The average estimate,

  • standard_error (float) – The standard error of estimate

stats_misc.meta_analysis.heterogeneity(estimates: List[float], standard_errors: List[float], overall_estimate: float | int, alpha: float = 0.05, tau2: Any | None = None) HeterogeneityResults[source]

Provides heterogeneity estimates (Q-test, I-squared, and Tau-square) [2]_, determining too what extent the individual estimates are distinct from the overall_estimate, accounting for difference in precision based on the supplied estimate specific standard_errors.

Parameters:
  • estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).

  • standard_errors (list [float]) – A list of standard errors of equal length as estimates.

  • overall_estimate (float or int) – The overall estimate, for example based on a meta-analysis of the point estimates of estimates.

  • alpha (float, default 0.05) – The alpha for the (1-alpha/2)*100% confidence interval.

  • tau2 (float, optional) – A possible external estimate of the tausquard used in the random effects estimator. If NoneType a non-iterative MM estimate will be used.

Returns:

results – An instance of the heterogeneity results class.

Return type:

HeterogeneityResults

References

Notes

Heterogeneity will be internally evaluated using a Chi-square distribution with len(estimates) - 1 degrees of freedom.

The heterogeneity estimates are based on [2]_, specifically the tau-squared is estimated using the DerSimonian and Laird method of moments method (without iteration).

stats_misc.meta_analysis.random_effects(estimates: List[float], standard_errors: List[float], between_estimate_variance: float) Tuple[float, float][source]

Calculated an average, with the contribution of each element in estimates weighted by the inverse of the squared standard error of the estimate plus an estimate of the between estimate variance.

This is equivalent to a fixed effect meta-analysis, which assumes the between study variance is zero.

Parameters:
  • estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).

  • standard_errors (list [float]) – A list of standard errors of equal length as estimates.

  • float (between_estimate_variance) – The between estimate variance, often referred to as the tau-squared. Can be estimated using the Heterogeneity function.

Returns:

  • estimate (float) – The average estimate,

  • standard_error (float) – The standard error of estimate

stats_misc.tests

A collection of null-hypothesis tests.

class stats_misc.tests.AnovaTestResults(**kwargs)[source]

A results object specifically for ANOVA tests.

test_statistic

The F-statistic.

Type:

float

p_value

The p-value associated with the F-statistic.

Type:

float

explained_sum_squares

The sum of squares explained by the model (between-groups sum of squares).

Type:

float

residual_sum_squares

The sum of squares of residuals (within-groups sum of squares).

Type:

float

df_numerator

The numerator degrees of freedom.

Type:

int

df_denominator

The denominator degrees of freedom.

Type:

int

class stats_misc.tests.TestResults(**kwargs)[source]

A results objects containing test values.

point_estimate

The point estimate.

Type:

float

standard_error

The standard error of the point estimate.

Type:

float

test_statistic

The test statistic, evaluated against the null hypothesis value.

Type:

float

p_value

The p-value of the test evaluated against the null hypothesis value

Type:

float

null_value

The null hypothesis value the point estimate is evaluated against.

Type:

float

stats_misc.tests.anova_one_way(means: list[int | float], variances: list[int | float], sizes: list[int]) AnovaTestResults[source]

Performs a one-way ANOVA based on aggregate data.

Parameters:
  • means (list [int | float]) – The group means.

  • variances (list [int | `float]) – The group variances.

  • sizes (list [int]) – The group sample sizes.

Returns:

A results class.

Return type:

TestResults

Notes

This is an alternative to scipy’s one-way ANOVA implementation which requires access to the individual observations (i.e. for each row in a table).

stats_misc.tests.ks_test(data: DataFrame, group: str, values: str, nulldistribution: str = 'uniform') Dict[str, str][source]

Will loop over the unique group values to perform overall null-hypothesis tests comparing sets of values against a null-distribution using the Kolmogorov-Smirnoff test.

Parameters:
  • data (pd.DataFrame) – A data table.

  • group (str) – A column name in data which will be used to group the values.

  • values (str) – A column name in data to which you want to apply the Kolmogorov-Smirnoff test to.

  • nulldistribution (str, default uniform) – The null-distribution the values should be compared against. This maps to the Scipy.stats available distributions.

Returns:

results – A dictionary with group values and a KstestResults class a items.

Return type:

dict

stats_misc.tests.wald_interaction_test(point: Tuple[float, float] | List, se: Tuple[float, float] | List, null_value: float = 0.0) TestResults[source]

Statistic test whether the difference between to point estimates is distinct from the null hypothesis values (null_value).

The tests simply calculates the difference between two point estimates and calculates the standard error of this differences by taking the squared root of the sum of the squared standard errors of the point estimates. The fraction of the difference by its standard error is compared to a standard normal distribution.

Parameters:
  • point (tuple [float, float]) – Two point estimates, for example the mean difference or log odds ratio.

  • se (tuple [float, float]) – Two standard errors of the point estimates.

  • null_value (float, default 0.0) – The null-hypothesis value of the difference between the point estimates.

Returns:

results – A results class.

Return type:

TestResults

stats_misc.resampling

A module for resampling techniques such as bootstrap, jackknife, or permutation. Currently focussed on confidence interval estimation using canonical boostrap algorithms.

class stats_misc.resampling.BCaCI(data: tuple[ndarray[Any, dtype[Any]], ...])[source]

A Bias-Corrected and Accelerated (BCa) bootstrap confidence interval.

Parameters:

data (tuple [np.ndarray]) – A tuple of numpy arrays with the same number of rows.

data

A tuple of numpy arrays with the same number of rows.

Type:

tuple [np.ndarray]

Notes

The BCa interval generally has a better coverage (i.e. the smallest confidence interval while retaining advertised coverage) than most alternative boostrap confidence interval methods.

property boot_sample: tuple[ndarray[Any, dtype[Any]]]

Getter function for boot_sample

property data: tuple[ndarray[Any, dtype[Any]]]

Getter function for data

property jack_sample: tuple[ndarray[Any, dtype[Any]]]

Getter function for jack_sample

class stats_misc.resampling.BCaCIResults(**kwargs: Any)[source]

The results class for BCaCI.

interval_indices

The bootstrap indices of the interval lower and upper bounds.

Type:

list [int] or np.ndarray

interval_values

The lower and upper bounds of the interval.

Type:

list [float] or np.ndarray

coverage

The confidence interval coverage.

Type:

float

class stats_misc.resampling.BaseBootstrap(data: tuple[ndarray[Any, dtype[Any]], ...], verbose: bool = True)[source]

Base class for generating bootstrap estimates.

Parameters:
  • data (tuple [np.ndarray, …]) – A tuple containing one or more numpy arrays representing the input data to be used for bootstrapping.

  • verbose (bool, default True) – Whether warnings should be returned.

_data

The input data arrays from which bootstrap samples are drawn.

Type:

tuple [np.ndarray, …]

bootstrap_samples(n_estimates, statsfunction, n_reps, \*\*kwargs)[source]

Generate bootstrap replicates and return the resulting statistics.

original_estimate(statsfunction, \*\*kwargs)[source]

Compute the statistic(s) on the original (non-resampled) data.

Notes

This class handles the generation of bootstrap replicates and calculation of original estimates from the provided dataset using a specified statistical function.

bootstrap_samples(n_estimates: int, statsfunction: Callable, n_reps: int = 999, verbose: bool = True, **kwargs: Any | None) ndarray[Any, dtype[Any]][source]

Generates a bootstrap sample, applies a function, and removes missing values.

Parameters:
  • n_estimates (int) – The number of estimates (statistics) to compute for each bootstrap sample.

  • statsfunction (Callable) – The statistical function to apply to each bootstrap sample.

  • n_reps (int, default 999) – The number of bootstrap repetitions (default is 999).

  • **kwargs (dict [str, any]) – Additional keyword arguments passed to statsfunction.

Returns:

An array of bootstrap estimates, with any samples containing missing values removed.

Return type:

np.ndarray

original_estimate(statsfunction: Callable, **kwargs: Any | None) ndarray[source]

Calculate the estimate based on the non-resampled data.

Parameters:
  • statsfunction (Callable) – The statistical function which will be applied to data.

  • **kwargs (dict [str, any]) – Additional keyword arguments passed to statsfunction.

Returns:

An array containing the original statistic(s).

Return type:

np.ndarray

class stats_misc.resampling.BasicCI(data: tuple[ndarray[Any, dtype[Any]], ...], verbose: bool = True)[source]

Basic bootstrap confidence interval estimator.

_data

The input data arrays from which bootstrap samples are drawn.

Type:

tuple [np.ndarray, …]

Notes

The basic bootstrap confidence interval is calculated by reflecting the standard percentile intervals around the original estimate.

class stats_misc.resampling.BootCIResults(*, set_args: list[str] | None = None, **kwargs: Any)[source]

The results class for bootstrapped confidence intervals.

interval_indices

The bootstrap indices of the interval lower and upper bounds.

Type:

list [int]

interval_values

The lower and upper bounds of the interval.

Type:

list [float]

coverage

The interval coverage.

Type:

float

class stats_misc.resampling.PercentileCI(data: tuple[ndarray[Any, dtype[Any]], ...], verbose: bool = True)[source]

Percentile bootstrap confidence interval estimator.

_data

The input data arrays from which bootstrap samples are drawn.

Type:

tuple [np.ndarray, …]

class stats_misc.resampling.StudentisedCI(data: tuple[ndarray[Any, dtype[Any]], ...], verbose: bool = True)[source]

Studentised bootstrap confidence interval estimator.

_data

The input data arrays from which bootstrap samples are drawn.

Type:

tuple of NDArray

Notes

This class performs a nested bootstrap to estimate the standard error of the bootstrapped statistic(s)

stats_misc.resampling.bootstrap_replicates(data: tuple[ndarray[Any, dtype[Any]], ...], statsfunction: Callable, n_estimates: int, n_reps: int = 999, **kwargs: Any | None) ndarray[Any, dtype[Any]][source]

Draws n_reps bootstrap samples and applies statsfunction on each bth sample, return the results as an numpy array.

Parameters:
  • data (tuple [np.ndarray]) – A tuple of numpy arrays with the same number of rows.

  • statsfunction (Callable) – A function which can unpack a tuple of arrays and perform analyses on these: statsfunction(*data). The function can return the estimate(s) as a single float/int, list, tuple or numpy array.

  • n_estimates (int) – The number of estimates statsfunction will return.

  • n_reps (int, default 999) – The number of bootstrap samples.

  • **kwargs (any) – Any keyword arguments supplied to statsfunctions.

Returns:

boots – A 2d numpy array with dims equal to n_reps by n_estimates.

Return type:

np.ndarray

Notes

The helper functions have been optimised through numba.njit.

Will internally map 1d arrays to 2d to deal with numba requirements.

Examples

>>> import numpy as np
>>> np.random.seed(42)
>>>
>>> # Example statsfunction that returns a list of statistics
>>> def stats_list(*data):
...     return [np.mean(data[0]), np.std(data[0])]
>>>
>>> result = bootstrap_replicates(
...     (np.random.randn(10), np.random.randn(10)),
...     stats_list, n_estimates=2, n_reps=10
... )
>>> print(result)
stats_misc.resampling.jackknife_replicates(data: tuple[ndarray[Any, dtype[Any]], ...], statsfunction: Callable, n_estimates: int, **kwargs: Any | None) ndarray[Any, dtype[Any]][source]

Performs jackknife resampling procedure which systematically leaves out one observation at a time and applies statsfunction.

Parameters:
  • data (tuple [np.ndarray]) – A tuple of numpy arrays with the same number of rows.

  • statsfunction (Callable) – A function which can unpack a tuple of arrays and perform analyses on these: statsfunction(*data). The function can return the estimate(s) as a single float/int, list, tuple or numpy array.

  • n_estimates (int) – The number of estimates statsfunction will return.

  • **kwargs (any) – Any keyword arguments supplied to statsfunctions.

Returns:

jacked – A 2d numpy array with dims equal to data[0].shape[0] by n_estimates.

Return type:

np.ndarray

Notes

The helper functions have been optimised through numba.njit.

Will internally map 1d arrays to 2d to deal with numba requirements.

stats_misc.machine_learning.validation

A collection of metrics used to evaluate a prediction model, covering discrimination and calibration statistics.

Currently the module is focussed on binary/classification tasks.

class stats_misc.machine_learning.validation.CalibrationResults(**kwargs)[source]

A results objects containing calibration estimates.

calibration_slope

The calibration slope.

Type:

float

calibration_slope_se

The standard error of the calibration slope.

Type:

float

calibration_intercept

Calibration-in-the-large estimate.

Type:

float

calibration_intercept_se

The standard error of the calibration-in-the-large.

Type:

float

observed_predict_table

an optional table with observed and predicted risks.

Type:

pd.DataFrame

class stats_misc.machine_learning.validation.CstatisticResults(**kwargs)[source]

A results objects containing c-statistic estimates.

c_statistic

The c-statistic.

Type:

float

lower_bound

The lower bound of the confidence interval.

Type:

float

upper_bound

The upper bound of the confidence interval.

Type:

float

confidence_interval

The lower and upper bounds of the confidence interval.

coverage

The interval coverage.

Type:

float

standard_error

The standard error of the c-statistic, based on an asymptotic approximation to the normal distribution.

Type:

float

class stats_misc.machine_learning.validation.RecalibrationResults(**kwargs)[source]

A results objects containing recalibration estimates.

slope

The slope estimate

Type:

float

intercept

The intercept estimate

Type:

float

table_recalibrated

A table with the recalibrated estimates appended.

Type:

pd.DataFrame

stats_misc.machine_learning.validation.cstatistic(observed: Series | ndarray, predicted: Series | ndarray, alpha: None | float = None) CstatisticResults[source]

Calculates Harell’s “c-statistic” for models predicting a binary outcome

Parameters:
  • observed (pd.Series or np.array) – a column vector with the observed class.

  • predicted (pd.Series or np.array) – a column vector with the predicted class or predicted probability.

  • alpha (float, default NoneType) – the (1-alpha)% confidence interval. Default None for no confidence interval.

Returns:

Harrel’s c-statistics with optional confidence interval.

Return type:

CstatisticResults

Notes

Predictions (predicted column) can range between +- infinity, allowing for evaluations of non-Prob scores.

The standard error are derived based on: .. [1] JA Hanley and BJ McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology 1982; 29-36.” see (Table 2).

stats_misc.machine_learning.validation.format_roc(observed: Series | ndarray, predicted: Series | ndarray, **kwargs: Any | None) DataFrame[source]

Takes a binary observed column vector and a continuous predicted column vector, and returns a pd.DataFrame with the columns false_positive, sensitivity and threshold.

Parameters:
  • observed (numpy array) – A column vector of the observed binary outcomes.

  • predicted (numpy array) – A column vector of the predicted outcome (should be continuous), e.g., representing the predicted probability.

  • kwargs (Any) – Supplied to sklearn.metrics.roc_curve.

Returns:

A table with columns: false_positive, sensitivity and threshold.

Return type:

pd.DataFrame

stats_misc.machine_learning.validation.get_calibration(data: DataFrame, observed: str, predicted: str, alpha: float = 0.05, bins: None | int = None, kwargs_ci: None | dict = None) CalibrationResults[source]

Estimate the calibration slope and calibration-in-the-large assuming a binomial data generating model. A binomial model is generally appropriate if the predicted risk reflects an event occurring at a fixed moment in time, e.g. after 1 year or 1 hour.

Parameters:
  • data (pd.DataFrames) – A table with the columns observed and predicted.

  • observed (str) – a column name in data referencing the binary outcome column.

  • predicted (str) – a column name in data referencing the logit predicted risk.

  • bins (int, optional, default NoneType) – an optional integer used to create equally sized bins of the predicted logit risk and returns the average predicted and observed risk in each bin (an observed vs expected table)

  • alpha (float, default 0.05) – the (1-alpha)% confidence interval. Used for the observed risk when bin is supplied.

Returns:

A calibration intercept and slope estimates, and an optional observed and expected table.

Return type:

CalibrationResults

Notes

The function DOES not presuppose the predicted risk is derived from a logistic (binomial) regression model. ANY model predicting risk at a fixed moment in time is acceptable, including models that typically provide interval predictions such a classification trees. Some/many models or rules may only provide predicted risk not the logit risk, in such cases simply call the validation.logit function to derive the appropriate variable.

stats_misc.machine_learning.validation.recalibrate(data: DataFrame, score: str, observed: str, model: str = 'binomial') RecalibrationResults[source]

The original score will be re-trained on observed outcome data, improving the agreement between the observed and predicted risk/outcome.

Parameters:
  • data (pd.DataFrame) – A table with columns score and observed.

  • observed (str) – a column name in data

  • score (str) – a column name in data. Note that contrary to the calibration function the score can be in any format depending on intended use. For binomial model one would typically want to supply a logit risk.

  • model (str, default binomial) – which model should be used (default: ‘binomial’, or ‘gaussian’)

Returns:

The recalibration intercept and slope, as well as a table with the recalibrated predictions.

Return type:

RecalibrationResults

stats_misc.machine_learning.sksurv_utils

A collection of utils for scikit-survival. Currently, the module focussed on downstream extracton of predictions and outcomes, as well as on tools to help with model validation.

The code can likely be generalised further to work with non-sksurv models as well.

class stats_misc.machine_learning.sksurv_utils.SurvivalModelBrierScore(model: Callable, times: ndarray)[source]

Evaluates a sklearn-survival model using the integrated Brier score, and computes the baseline (non-informative) Brier score based on the event incidence in supplied test data.

Parameters:
  • model (Callable) – A fitted sklearn-survival model that implements a predict_survival_function method returning survival functions for individuals.

  • times (np.ndarray) – A sequence of time points over which the integrated Brier score is computed.

model

The survival model used for prediction including the method predict_survival_function

Type:

Callable

times

An array of time points used to evaluate the Brier score.

Type:

np.ndarray

baseline_brier_score

The Brier score of a non-informative model that always predicts the event incidence (π(1 - π)), calculated from the test data.

Type:

float or NoneType

ibs

The integrated Brier score of the model on the test data.

Type:

float or NoneType

bss

The Brier skills score, representing thr eation of the ibs and the baseline_brier_score.

Type:

float or NoneType

evaluate(Y_train, Y_test, X_test)

Computes the integrated Brier score and baseline Brier score based on predictions over the specified time grid.

get_brier_skill_score()[source]

Returns the Brier skill score (BSS), which quantifies improvement over the baseline Brier score.

get_brier_skill_score()[source]

Computes the Brier Skill Score (BSS), which represents the improvement of the model over the baseline (non-informative) predictor.

Raises:

AttributeError – Raises an error if the __call__ method has not run.

Returns:

The Brier Skill Score

Return type:

float

class stats_misc.machine_learning.sksurv_utils.SurvivalModelCalibration(model: ~typing.Callable, n_groups: int = 5, nonparametric_estimator: ~typing.Callable = <function kaplan_meier_estimator>)[source]

Creates groups based on the predicted survival and compared the predicted event rate to the non-parametric event rate.

Parameters:
  • model (Callable) – A fitted scikit-survival model with a predict_survival_function method.

  • n_groups (int, default 5) – Number of equally sized participant groups, created based on the predicted survival.

  • nonparametric_estimator (Callable, default KaplanMeierEstimator) – A nonparametric estimator class from sksurv.nonparametric, e.g. KaplanMeierEstimator or CumulativeIncidenceEstimator.

group_summary_

summary statistics for each group: predicted survival and observed estimate.

Type:

dict [int, dict]

stats_misc.machine_learning.sksurv_utils.event_by_time(Y: ndarray, times: list[float], event_name: str = 'event', time_name: str = 'time') DataFrame[source]

Determines whether an event has occurred by specified time points.

Parameters:
  • Y (np.ndarray) – Structured array with fields (‘event’, ‘time’), typically created using sksurv.util.Surv.from_arrays.

  • times (Sequence[float]) – Time points at which to assess event occurrence.

  • event_name (str, default event) – Name for the event column in y.

  • time_name (str, default time) – Name for the time column in y.

Returns:

A DataFrame of shape (n_samples, len(times)) with boolean values indicating whether the event had occurred by each time point.

Return type:

pd.DataFrame

stats_misc.machine_learning.sksurv_utils.surv_model_auc(model: Callable | None, times: list[float], data: tuple[str, ndarray, ndarray], train_y: ndarray) DataFrame[source]

Computes time-dependent AUCs for survival models using cumulative dynamic AUC.

Parameters:
  • model (callable or NoneType) – A fitted survival model with a predict(X) method. If None, assumes d_tup[1] already contains predicted risk scores.

  • times (list [float]) – Time points at which to evaluate the time-dependent AUC.

  • data (tuple [str, np.ndarray, np.ndarray]) – The tuple should contain (label, test_y, test_x or predicted_risk), where test_y is a structured array of survival data (e.g., from sksurv.util.Surv.from_arrays), and label is a descriptor of the data split (e.g., ‘test’, ‘validation’).

  • train_y (np.ndarray) – Structured array of survival outcomes for training data, used to construct the risk sets.

Returns:

A long-format DataFrame with columns: - “Data split”: label of the data subset - “Time”: time points evaluated - “AUC by time”: corresponding AUC values - “Mean AUC”: mean of AUCs across time points (NaNs ignored)

Return type:

pd.DataFrame

Notes

This function evaluates the performance of a survival model across multiple time points using the cumulative/dynamic AUC approach (as implemented in cumulative_dynamic_auc from sksurv.metrics). It handles multiple data splits or datasets and aggregates the results into a tidy DataFrame.

Examples

>>> from sksurv.util import Surv
>>> import numpy as np
>>> train_y = Surv.from_arrays([True, False], [10, 20])
>>> test_y = Surv.from_arrays([True, False], [5, 25])
>>> pred = np.array([0.1, 0.4])
>>> data = ("test", test_y, pred)
>>> times = [5, 10, 15]
>>> surv_model_auc(None, times, data, train_y)
stats_misc.machine_learning.sksurv_utils.surv_model_calibration_table(model: Callable, X: ndarray, Y: ndarray, times: list[float]) DataFrame[source]

Creates a table of predicted event probability and actual observed outcomes by time.

Parameters:
  • model (callable) – A fitted scikit-survival model supporting predict_survival_function(X).

  • X (pd.DataFrame) – Feature data for prediction.

  • Y (np.ndarray) – Structured array with fields (‘event’, ‘time’), typically created using sksurv.util.Surv.from_arrays.

  • times (list [float]) – Time points at which to compute risks and determine event status.

Returns:

A DataFrame combining predicted risk probabilities and binary indicators of event occurrence by each time point.

Return type:

pd.DataFrame

stats_misc.machine_learning.sksurv_utils.surv_model_predict(model: Callable, X: ndarray, times: list[float]) DataFrame[source]

Predicted event probabilities at specific follow-up times using a scikit-survival model.

Parameters:
  • model (BaseSurvivalModel) – A fitted scikit-survival model (e.g., CoxPHSurvivalAnalysis, RandomSurvivalForest).

  • X (np.ndarray) – Feature data for prediction.

  • times (list [float]) – Time points at which to estimate risk, in the same units used for training the model.

Returns:

A DataFrame with shape (n_samples, len(times)), where each column corresponds to the predicted risk (1 - survival probability) at a given time.

Return type:

pd.DataFrame

stats_misc.machine_learning.firthlogist

A module implementing Firth’s logistic regression.

Firth’s regression is a penalised likelihood method that addresses small-sample bias and perfect separation in logistic regression. It adjusts score function to yield more reliable parameter estimates.

Note this is essentially a fork of the GitHub firthlogist repo, with minor tweaks to work on python 3.10+.

class stats_misc.machine_learning.firthlogist.FirthLogisticRegression(max_iter=25, max_halfstep=0, max_stepsize=5, pl_max_iter=100, pl_max_halfstep=0, pl_max_stepsize=5, tol=0.0001, fit_intercept=True, skip_pvals=False, skip_ci=False, alpha=0.05, wald=False, test_vars=None)[source]

Logistic regression with Firth’s bias reduction method.

This is based on the implementation in the logistf R package. Please see the logistf references [1]_ and [2]_ for details about the method.

Parameters:
  • max_iter (int, default 25) – The maximum number of Newton-Raphson iterations.

  • max_halfstep (int, default 0) – The maximum number of step-halvings in one Newton-Raphson iteration.

  • max_stepsize (int, default 5) – The maximum step size - for each coefficient, the step size is forced to be less than max_stepsize.

  • pl_max_iter (int, default 100) – The maximum number of Newton-Raphson iterations for finding profile likelihood confidence intervals.

  • pl_max_halfstep (int, default 0) – The maximum number of step-halvings in one iteration for finding profile likelihood confidence intervals.

  • pl_max_stepsize (int, default 5) – The maximum step size while finding PL confidence intervals.

  • tol (float, default 0.0001) – Convergence tolerance for stopping.

  • fit_intercept (bool, default True) – Specifies if intercept should be added.

  • skip_pvals (bool, default False) – If True, p-values will not be calculated. Calculating the p-values can be time-consuming if wald=False since the fitting procedure is repeated for each coefficient.

  • skip_ci (bool, default False) – If True, confidence intervals will not be calculated. Calculating the confidence intervals via profile likelihoood is time-consuming.

  • alpha (float, default 0.05) – Significance level (confidence interval = 1-alpha).

  • wald (bool, default False) – If True, uses Wald method to calculate p-values and confidence intervals.

  • test_vars (int, list [int], or None, default None) – Index or list of indices of the variables for which to calculate confidence intervals and p-values. If None, calculate for all variables. This option has no effect if wald=True.

bse_

Standard errors of the coefficients.

classes_

A list of the class labels.

ci_

The fitted profile likelihood confidence intervals.

coef_

The coefficients of the features.

intercept_

Fitted intercept. If fit_intercept = False, the intercept is set to zero.

loglik_

Fitted penalized log-likelihood.

llf_

A statsmodels compliant allies for loglik_

n_iter_

Number of Newton-Raphson iterations performed.

pvals_

p-values calculated by penalized likelihood ratio tests.

References

Biometrika 80, 27–38.

in logistic regression. Statistics in Medicine 21: 2409-2419.

property bse

The standard errors of the regression coefficients

decision_function(X)[source]

Returns the linear predictors (including the intercept).

Parameters:

X (np.ndarray) – A user supplied design matrix, should have the same number of variables used to derive the fitted model.

Notes

The returned value is on the logit(risk) scale.

fit(X, y)[source]

Fits a Firth logistic regression model.

Parameters:
  • X (np.ndarray) – A 2d numpy array with dimmension n (subjects) by k (variables). The array represents the design matrix of independent variables.

  • y (np.ndarray) – A 1d or 2d numpy array with dimmension n by 1. The array represents the dependent variable, typically coded as 0 and 1.

property llf

The loglikelihood, statsmodels allies of loglik

property loglik

The loglikelihood

property params

The regression coefficients

predict(X)[source]

Returns predicted values based on a design matrix.

Parameters:

X (np.ndarray) – A user supplied design matrix, should have the same number of variables used to derive the fitted model.

Notes

Returns 0 or 1 depending on whether the logit(risk) is larger than zero.

predict_proba(X)[source]

Returns the predicted probability.

Parameters:

X (np.ndarray) – A user supplied design matrix, should have the same number of variables used to derive the fitted model.

property pvalues

The p-values of the regression coefficients

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FirthLogisticRegression

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

summary(xname=None, tablefmt='simple')[source]

Prints a summary table.

Parameters:
  • xname (list [str] or None, default NoneType) – Names for the X variables. Default is x1, x2, … Must match the number of parameters in the model.

  • tablefmt (str) – tabulate table format for output. Please see the documentation for tabulate for options.

stats_misc.utils.general

Functions used in multiple stat-miscs modules which may be of general utility.

class stats_misc.utils.general.ObservedPower(statistic: float, alpha: float = 0.05, **kwargs: Any)[source]

Computes observed power (two-sided, above, below) for a given z-statistic, based on the normal distribution.

Parameters:
  • statistic (float) – The z-statistic (e.g., estimate / standard error).

  • alpha (float, default 0.05) – The type I error rate.

  • **kwargs – Optional keyword arguments passed to the distribution.

compute_power()[source]

Compute two-sided observed power using the configured distribution.

compute_power_above()[source]

Compute one-sided observed power for the ‘above’ (right) side.

compute_power_below()[source]

Compute one-sided observed power for the ‘below’ (left) side.

Notes

Observed power is equivalent to the p-value evaluated under the alternative hypothesis instead of the null-hypothesis. For example, if the p-value is 0.05 and one is performing a test against an alpha of 0.05, the observed power is 50% (i.e., if we would repeat the exact same experiment, under the alternative hypothesis, there would be a 50% probability of observing a p-value smaller than 0.05. Because, of the equivalence between the p-value and observed power, it has very limited application.

compute_power() float[source]

Compute two-sided observed power using the configured distribution.

crit

The critical value beyond which the null-hypothesis would be rejected.

Type:

float

Returns:

The observed power.

Return type:

float

compute_power_above() float[source]

Compute one-sided observed power for the ‘above’ (right) side.

crit

The critical value beyond which the null-hypothesis would be rejected.

Type:

float

Returns:

The observed power.

Return type:

float

compute_power_below() float[source]

Compute one-sided observed power for the ‘below’ (left) side.

crit

The critical value beyond which the null-hypothesis would be rejected.

Type:

float

Returns:

The observed power.

Return type:

float

class stats_misc.utils.general.ObservedPowerChiSquare(statistic: float, df: float, alpha: float = 0.05)[source]

Computes observed power for a given statistic, using the non-central chi-square distribution.

Parameters:
  • statistic (float) – The chi-square statistic (e.g., sum of squares of standardised residuals).

  • df (float) – Degrees of freedom for the chi-square distribution.

  • alpha (float, default 0.05) – The type I error rate.

Notes

Inherits from ObservedPower but overrides the distribution to accommodate its strictly positive domain.

In a chi-square context, one generally considers whether the observed statistic is ‘above’ a critical threshold. Hence, compute_power() internally calls compute_power_above() rather than summing two tails. Because chi-square is not defined below zero, compute_power_below() is not meaningful and thus raises a NotImplementedError.

compute_power() float[source]

Compute one-sided observed power for the ‘above’ (right) side.

crit

The critical value beyond which the null-hypothesis would be rejected.

Type:

float

Returns:

The observed power.

Return type:

float

compute_power_above() float[source]

Compute observed power for the right tail using the non-central chi-square distribution.

crit

The critical value (from the central distribution) beyond which the null hypothesis would be rejected.

Type:

float

Returns:

The critical value beyond which the null-hypothesis would be rejected.

Return type:

float

Notes

The procedure: - First, compute the critical value (ppf) under the central chi-square

(nc=0).

  • Then, evaluate the cdf at x = self.crit under the non-central chi-square with nc = self.statistic.

  • Substract the density from 1 to calculate the observed power.

This approach can be easily verified using the normal distribution

>>> from scipy.stats import norm
>>> # use a test statistic of 4, and critical value of 1.96
>>> ncnorm = norm(loc=4, scale=1.0)
>>> 1-ncnorm.cdf(1.96)
0.9793
>>> norm.cdf(4-1.96)
0.9793
compute_power_below() float[source]

Raises a NotImplementedError because a chi-square distribution is only defined on [0, ∞), so a ‘below’ tail does not correspond to a meaningful test.

Raises:

NotImplementedError – Because ‘below’ power is not defined for chi-square.

class stats_misc.utils.general.ObservedPowerT(statistic: float, df: float, alpha: float = 0.05)[source]

Computes observed power (two-sided, above, below) for a given test-statistic, using the t-distribution.

Parameters:
  • statistic (float) – The t-statistic (e.g., estimate / standard error).

  • df (float) – The degrees of freedom for the t-distribution.

  • alpha (float, default 0.05) – The type I error rate.

compute_power()

Compute two-sided observed power using the configured distribution.

compute_power_above()

Compute one-sided observed power for the ‘above’ (right) side.

compute_power_below()

Compute one-sided observed power for the ‘below’ (left) side.

Notes

Inherits from ObservedPower and adds degrees of freedom via keyword arguments.

stats_misc.utils.general.calculate_pvalue(z_statistic, side: Literal['two', 'left', 'right'] = 'two') float[source]

Calculate the P-value based on the Z-statistic and the standard normal distribution.

Parameters:
  • z_statistic (float) – Typically the ratio of the point estimate and the standard error, representing the standardized difference from the value under the null-hypothesis.

  • side ({two, left, right, ‘below’, ‘above’}, default two) – left will perform a left-sided, one-sided, z-test. right will perform a right-sided, one-sided, z-test. below is a synonym for left, and ‘above` is a synonym for right.

Returns:

P-value.

Return type:

float

Notes

Use left if your null-hypothesis is .. math:: mu geq mu_0, and the alternative hypothesis is .. math:: mu < mu_0.

Use right if your null-hypothesis is .. math:: mu leq mu_0, and the alternative hypothesis is .. math:: mu > mu_0.

So for left the alternative hypothesis supposes is to the left of null-hypothesis, while for right we assume this is to the right.

stats_misc.utils.general.calculate_pvalue_tdist(t_statistic, degrees, side: Literal['two', 'left', 'right'] = 'two')[source]

Calculate the P-value based on the t-statistic and the t distribution. The function assumes one want to perform a two-sided test.

Parameters:
  • t_statistic (float) – Typically the ratio of the point estimate and the standard error, representing the standardized difference from the value under the null-hypothesis.

  • degrees (int) – The degrees of freedom.

  • side ({two, left, right, ‘below’, ‘above’}, default two) – left will perform a left-sided, one-sided, z-test. right will perform a right-sided, one-sided, z-test. below is a synonym for left, and ‘above` is a synonym for right.

Returns:

P-value.

Return type:

float

stats_misc.utils.general.invlogit(l: ndarray)[source]

The inverse of the logit function.

Parameters:

l (np.array, float or int) – array of probabilities.

Returns:

A np.array of length l

Return type:

np.array

stats_misc.utils.general.logit(p: ndarray, add: float = 0.0001, sub: float = 0.0001)[source]

The logit function.

Parameters:
  • p (np.array) – array of probabilities.

  • add (float, default 0.0001) – constant to be added if p is equal to 0.

  • sub (float, default 0.0001) – constant to be subtracted if p is equal to 1.

Returns:

A np.array of length p

Return type:

np.array