Includes a mixture of functions to estimates intervals from a collection of
data. Type of intervals include confidence intervals, as well as
prediction intervals.
Better confidence intervals for proportions, based on a classic
contribution by SA Julious [1]_.
Parameters:
proportion (float) – A proportion between 0 and 1.
total_sample (int) – The total sample size the proportion was derived from.
alpha (float, default 0.05) – A float between 0 and 1, representing the type 1 error rate. Is used
to define the confidence interval coverage: (1-alpha)*100.
integer (bool, default True) – whether the function should fail when the proportion times total_sample
does not result in an integer number of events. Set to False to
ignore the ValueError, which will replace the produt with its closest
integer value.
Returns:
results – A results class returning the lower and upper confidence interval
bounds.
Return type:
IntervalResults
References
proportion: comparison of seven methods by Robert G. Newcombe, Statistics
in Medicine 1998; 17:857-872
Estimates the locations of the 100(1-alpha)% lower bound and upper bound
for any desired quantile, and will return the values of these lower and
upper bounds.
Parameters:
values (list [float]) – A list of floats representing the available data.
quantile (float) – A float between 0 and 1, e.g., use 0.5 for the median, 0.25 for
quartile 1, or 0.025 for the 2.5% percentile.
alpha (float, default 0.05) – The 100(1-alpha)% confidence level.
Returns:
results – An interval class results object, including the interval indices,
bounds, and exact coverage.
Return type:
QuantilesIntervalResults
Notes
The algorithm will attempt to find a near symmetric confidence interval,
which is guaranteed to have a coverage equal or larger than requested.
Internally this uses the binomial function to calculate the probability of
a value being smaller or equal than the requested quantile, hence resulting
in exact confidence interval limits, irrespective of the underlying
distribution. These intervals will typically be larger than intervals based
on (semi)-parametric solutions.
The codes is based on the
stackexchange answer by whuber.
Which in turn is based on Chapter five of this
book by Meeker, Hahn, and Escobar.
A module to calculated weighted averages of point estimates, using fixed
effect or random effects methods. Includes various estimates of heterogeneity.
Provides heterogeneity estimates (Q-test, I-squared, and Tau-square) [2]_,
determining too what extent the individual estimates are distinct from
the overall_estimate, accounting for difference in precision based on
the supplied estimate specific standard_errors.
Parameters:
estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).
standard_errors (list [float]) – A list of standard errors of equal length as estimates.
overall_estimate (float or int) – The overall estimate, for example based on a meta-analysis of the
point estimates of estimates.
alpha (float, default 0.05) – The alpha for the (1-alpha/2)*100% confidence interval.
tau2 (float, optional) – A possible external estimate of the tausquard used in the random effects
estimator. If NoneType a non-iterative MM estimate will be used.
Returns:
results – An instance of the heterogeneity results class.
Heterogeneity will be internally evaluated using a Chi-square distribution
with len(estimates) - 1 degrees of freedom.
The heterogeneity estimates are based on [2]_, specifically the tau-squared
is estimated using the DerSimonian and Laird method of moments method
(without iteration).
Calculated an average, with the contribution of each element in estimates
weighted by the inverse of the squared standard error of the estimate plus
an estimate of the between estimate variance.
This is equivalent to a fixed effect meta-analysis, which assumes
the between study variance is zero.
Parameters:
estimates (list [float]) – A list of point estimates (e.g., mean differences or log odds ratios).
standard_errors (list [float]) – A list of standard errors of equal length as estimates.
float (between_estimate_variance) – The between estimate variance, often referred to as the tau-squared.
Can be estimated using the Heterogeneity function.
Returns:
estimate (float) – The average estimate,
standard_error (float) – The standard error of estimate
Will loop over the unique group values to perform overall null-hypothesis
tests comparing sets of values against a null-distribution using the
Kolmogorov-Smirnoff test.
Parameters:
data (pd.DataFrame) – A data table.
group (str) – A column name in data which will be used to group the values.
values (str) – A column name in data to which you want to apply the
Kolmogorov-Smirnoff test to.
nulldistribution (str, default uniform) – The null-distribution the values should be compared against. This
maps to the Scipy.stats available distributions.
Returns:
results – A dictionary with group values and a KstestResults class a items.
Statistic test whether the difference between to point estimates is distinct
from the null hypothesis values (null_value).
The tests simply calculates the difference between two point estimates and
calculates the standard error of this differences by taking the squared
root of the sum of the squared standard errors of the point estimates. The
fraction of the difference by its standard error is compared to a standard
normal distribution.
Parameters:
point (tuple [float, float]) – Two point estimates, for example the mean difference or log odds ratio.
se (tuple [float, float]) – Two standard errors of the point estimates.
null_value (float, default 0.0) – The null-hypothesis value of the difference between the point estimates.
A module for resampling techniques such as bootstrap, jackknife, or
permutation. Currently focussed on confidence interval estimation using
canonical boostrap algorithms.
A tuple of numpy arrays with the same number of rows.
Type:
tuple [np.ndarray]
Notes
The BCa interval generally has a better coverage (i.e. the smallest
confidence interval while retaining advertised coverage) than most
alternative boostrap confidence interval methods.
Compute the statistic(s) on the original (non-resampled) data.
Notes
This class handles the generation of bootstrap replicates and calculation
of original estimates from the provided dataset using a specified
statistical function.
Draws n_reps bootstrap samples and applies statsfunction on each bth
sample, return the results as an numpy array.
Parameters:
data (tuple [np.ndarray]) – A tuple of numpy arrays with the same number of rows.
statsfunction (Callable) – A function which can unpack a tuple of arrays and perform analyses on
these: statsfunction(*data). The function can return the estimate(s)
as a single float/int, list, tuple or numpy array.
n_estimates (int) – The number of estimates statsfunction will return.
n_reps (int, default 999) – The number of bootstrap samples.
**kwargs (any) – Any keyword arguments supplied to statsfunctions.
Returns:
boots – A 2d numpy array with dims equal to n_reps by n_estimates.
Return type:
np.ndarray
Notes
The helper functions have been optimised through numba.njit.
Will internally map 1d arrays to 2d to deal with numba requirements.
Examples
>>> importnumpyasnp>>> np.random.seed(42)>>>>>> # Example statsfunction that returns a list of statistics>>> defstats_list(*data):... return[np.mean(data[0]),np.std(data[0])]>>>>>> result=bootstrap_replicates(... (np.random.randn(10),np.random.randn(10)),... stats_list,n_estimates=2,n_reps=10... )>>> print(result)
Performs jackknife resampling procedure which systematically leaves out
one observation at a time and applies statsfunction.
Parameters:
data (tuple [np.ndarray]) – A tuple of numpy arrays with the same number of rows.
statsfunction (Callable) – A function which can unpack a tuple of arrays and perform analyses on
these: statsfunction(*data). The function can return the estimate(s)
as a single float/int, list, tuple or numpy array.
n_estimates (int) – The number of estimates statsfunction will return.
**kwargs (any) – Any keyword arguments supplied to statsfunctions.
Returns:
jacked – A 2d numpy array with dims equal to data[0].shape[0] by n_estimates.
Return type:
np.ndarray
Notes
The helper functions have been optimised through numba.njit.
Will internally map 1d arrays to 2d to deal with numba requirements.
Predictions (predicted column) can range between +- infinity, allowing for
evaluations of non-Prob scores.
The standard error are derived based on:
.. [1] JA Hanley and BJ McNeil, “The meaning and use of the area under a
receiver operating characteristic (ROC) curve, Radiology 1982; 29-36.” see
(Table 2).
Takes a binary observed column vector and a continuous predicted
column vector, and returns a pd.DataFrame with the columns
false_positive, sensitivity and threshold.
Parameters:
observed (numpy array) – A column vector of the observed binary outcomes.
predicted (numpy array) – A column vector of the predicted outcome (should be continuous), e.g.,
representing the predicted probability.
kwargs (Any) – Supplied to sklearn.metrics.roc_curve.
Returns:
A table with columns: false_positive, sensitivity and threshold.
Estimate the calibration slope and calibration-in-the-large assuming a
binomial data generating model. A binomial model is generally appropriate
if the predicted risk reflects an event occurring at a fixed moment in time,
e.g. after 1 year or 1 hour.
Parameters:
data (pd.DataFrames) – A table with the columns observed and predicted.
observed (str) – a column name in data referencing the binary outcome column.
predicted (str) – a column name in data referencing the logit predicted risk.
bins (int, optional, default NoneType) – an optional integer used to create equally sized bins of the predicted
logit risk and returns the average predicted and observed risk in
each bin (an observed vs expected table)
alpha (float, default 0.05) – the (1-alpha)% confidence interval. Used for the observed risk when
bin is supplied.
Returns:
A calibration intercept and slope estimates, and an optional
observed and expected table.
The function DOES not presuppose the predicted risk is derived from a
logistic (binomial) regression model. ANY model predicting risk at a fixed
moment in time is acceptable, including models that typically provide
interval predictions such a classification trees. Some/many models or rules
may only provide predicted risk not the logit risk, in such cases simply
call the validation.logit function to derive the appropriate variable.
score (str) – a column name in data. Note that contrary to the calibration function
the score can be in any format depending on intended use. For binomial
model one would typically want to supply a logit risk.
model (str, default binomial) – which model should be used (default: ‘binomial’, or ‘gaussian’)
Returns:
The recalibration intercept and slope, as well as a table with the
recalibrated predictions.
A collection of utils for scikit-survival. Currently, the module focussed on
downstream extracton of predictions and outcomes, as well as on tools to help
with model validation.
The code can likely be generalised further to work with non-sksurv models as
well.
Evaluates a sklearn-survival model using the integrated Brier score,
and computes the baseline (non-informative) Brier score based on the event
incidence in supplied test data.
Parameters:
model (Callable) – A fitted sklearn-survival model that implements a
predict_survival_function method returning survival functions for
individuals.
times (np.ndarray) – A sequence of time points over which the integrated Brier score is
computed.
Creates groups based on the predicted survival and compared the predicted
event rate to the non-parametric event rate.
Parameters:
model (Callable) – A fitted scikit-survival model with a predict_survival_function
method.
n_groups (int, default 5) – Number of equally sized participant groups, created based on the
predicted survival.
nonparametric_estimator (Callable, default KaplanMeierEstimator) – A nonparametric estimator class from sksurv.nonparametric, e.g.
KaplanMeierEstimator or CumulativeIncidenceEstimator.
Computes time-dependent AUCs for survival models using cumulative
dynamic AUC.
Parameters:
model (callable or NoneType) – A fitted survival model with a predict(X) method. If None,
assumes d_tup[1] already contains predicted risk scores.
times (list [float]) – Time points at which to evaluate the time-dependent AUC.
data (tuple [str, np.ndarray, np.ndarray]) – The tuple should contain (label, test_y, test_x or predicted_risk),
where test_y is a structured array of survival data
(e.g., from sksurv.util.Surv.from_arrays), and label is a
descriptor of the data split (e.g., ‘test’, ‘validation’).
train_y (np.ndarray) – Structured array of survival outcomes for training data, used to
construct the risk sets.
Returns:
A long-format DataFrame with columns:
- “Data split”: label of the data subset
- “Time”: time points evaluated
- “AUC by time”: corresponding AUC values
- “Mean AUC”: mean of AUCs across time points (NaNs ignored)
Return type:
pd.DataFrame
Notes
This function evaluates the performance of a survival model across
multiple time points using the cumulative/dynamic AUC approach
(as implemented in cumulative_dynamic_auc from sksurv.metrics).
It handles multiple data splits or datasets and aggregates the results
into a tidy DataFrame.
A module implementing Firth’s logistic regression.
Firth’s regression is a penalised likelihood method that addresses small-sample
bias and perfect separation in logistic regression. It adjusts score function
to yield more reliable parameter estimates.
Note this is essentially a fork of the GitHub firthlogist
repo, with minor tweaks to work on python 3.10+.
Logistic regression with Firth’s bias reduction method.
This is based on the implementation in the logistf R package. Please see
the logistf references [1]_ and [2]_ for details about the method.
Parameters:
max_iter (int, default 25) – The maximum number of Newton-Raphson iterations.
max_halfstep (int, default 0) – The maximum number of step-halvings in one Newton-Raphson iteration.
max_stepsize (int, default 5) – The maximum step size - for each coefficient, the step size is forced
to be less than max_stepsize.
pl_max_iter (int, default 100) – The maximum number of Newton-Raphson iterations for finding profile
likelihood confidence intervals.
pl_max_halfstep (int, default 0) – The maximum number of step-halvings in one iteration for finding profile
likelihood confidence intervals.
pl_max_stepsize (int, default 5) – The maximum step size while finding PL confidence intervals.
tol (float, default 0.0001) – Convergence tolerance for stopping.
fit_intercept (bool, default True) – Specifies if intercept should be added.
skip_pvals (bool, default False) – If True, p-values will not be calculated. Calculating the p-values can
be time-consuming if wald=False since the fitting procedure is
repeated for each coefficient.
skip_ci (bool, default False) – If True, confidence intervals will not be calculated. Calculating the
confidence intervals via profile likelihoood is time-consuming.
wald (bool, default False) – If True, uses Wald method to calculate p-values and confidence
intervals.
test_vars (int, list [int], or None, default None) – Index or list of indices of the variables for which to calculate
confidence intervals and p-values. If None, calculate for all variables.
This option has no effect if wald=True.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Compute one-sided observed power for the ‘below’ (left) side.
Notes
Observed power is equivalent to the p-value evaluated under the
alternative hypothesis instead of the null-hypothesis. For example, if the
p-value is 0.05 and one is performing a test against an alpha of 0.05, the
observed power is 50% (i.e., if we would repeat the exact same experiment,
under the alternative hypothesis, there would be a 50% probability of
observing a p-value smaller than 0.05. Because, of the equivalence between
the p-value and observed power, it has very limited application.
Computes observed power for a given statistic, using the non-central
chi-square distribution.
Parameters:
statistic (float) – The chi-square statistic (e.g., sum of squares of standardised
residuals).
df (float) – Degrees of freedom for the chi-square distribution.
alpha (float, default 0.05) – The type I error rate.
Notes
Inherits from ObservedPower but overrides the distribution
to accommodate its strictly positive domain.
In a chi-square context, one generally considers whether the observed
statistic is ‘above’ a critical threshold. Hence, compute_power()
internally calls compute_power_above() rather than summing two tails.
Because chi-square is not defined below zero, compute_power_below()
is not meaningful and thus raises a NotImplementedError.
The procedure:
- First, compute the critical value (ppf) under the central chi-square
(nc=0).
Then, evaluate the cdf at x = self.crit under the non-central
chi-square with nc = self.statistic.
Substract the density from 1 to calculate the observed power.
This approach can be easily verified using the normal distribution
>>> fromscipy.statsimportnorm>>> # use a test statistic of 4, and critical value of 1.96>>> ncnorm=norm(loc=4,scale=1.0)>>> 1-ncnorm.cdf(1.96)0.9793>>> norm.cdf(4-1.96)0.9793
Calculate the P-value based on the Z-statistic and the standard normal
distribution.
Parameters:
z_statistic (float) – Typically the ratio of the point estimate and the standard error,
representing the standardized difference from the value under the
null-hypothesis.
side ({two, left, right, ‘below’, ‘above’}, default two) – left will perform a left-sided, one-sided, z-test. right will
perform a right-sided, one-sided, z-test. below is a synonym for
left, and ‘above` is a synonym for right.
Calculate the P-value based on the t-statistic and the t distribution.
The function assumes one want to perform a two-sided test.
Parameters:
t_statistic (float) – Typically the ratio of the point estimate and the standard error,
representing the standardized difference from the value under the
null-hypothesis.
degrees (int) – The degrees of freedom.
side ({two, left, right, ‘below’, ‘above’}, default two) – left will perform a left-sided, one-sided, z-test. right will
perform a right-sided, one-sided, z-test. below is a synonym for
left, and ‘above` is a synonym for right.