Utility functions

The general module provides generic helper functions such as: logistic transformations, p-value calculators, and observed power utilities shared across stats-misc.

[1]:

# imports
import numpy as np
from stats_misc.utils import general

Logistic transformations

The logit and invlogit functions convert between probabilities and log-odds — the fundamental link used by logistic regression and related models.

logit

Transforms probabilities to log-odds. Values of exactly 0 or 1 are nudged by a small constant (controlled by add/sub) with a warning, since log-odds are undefined at these boundary.

[2]:

probs = np.array([0.0, 0.1, 0.25, 0.5, 0.75, 0.9])
log_odds = general.logit(probs.copy())
recovered = general.invlogit(log_odds)
for p, lo, r in zip(probs, log_odds, recovered):
    print(f"logit({p:.2f}) = {lo:+.4f}  ->  invlogit = {r:.4f}")

logit(0.00) = -9.2102  ->  invlogit = 0.0001
logit(0.10) = -2.1972  ->  invlogit = 0.1000
logit(0.25) = -1.0986  ->  invlogit = 0.2500
logit(0.50) = +0.0000  ->  invlogit = 0.5000
logit(0.75) = +1.0986  ->  invlogit = 0.7500
logit(0.90) = +2.1972  ->  invlogit = 0.9000

/tmp/ipykernel_4836/1514853904.py:2: UserWarning: Some values of p are 1 or 0. Adding or subtracting a small value, based on `add` and `sub`. Values of 1 or 0 may indicate the model is invalid for some subjects.
  log_odds = general.logit(probs.copy())

invlogit

Transforms log-odds to probabilities in (0, 1). A log-odds of 0 maps to a probability of 0.5; large positive values approach 1 and large negative values approach 0.

[3]:

log_odds = np.array([-5.0, -2.0, 0.0, 2.0, 5.0])
probs = general.invlogit(log_odds)
for lo, p in zip(log_odds, probs):
    print(f"invlogit({lo:+.1f}) = {p:.4f}")

invlogit(-5.0) = 0.0067
invlogit(-2.0) = 0.1192
invlogit(+0.0) = 0.5000
invlogit(+2.0) = 0.8808
invlogit(+5.0) = 0.9933

P-value calculation

Three functions cover the most common test statistics. All support side='two' (default), 'left'/'below', and 'right'/'above' for one-sided tests.

Z-statistic (normal distribution)

Use when the test statistic follows a standard normal distribution under the null-hypothesis. The canonical threshold of 1.96 yields a two-sided p-value of 0.05.

[4]:

z = 1.96
p_two = general.calculate_pvalue(z, side='two')
p_right = general.calculate_pvalue(z, side='right')
print(f"z = {z}: two-sided p = {p_two:.4f}, right-sided p = {p_right:.4f}")

z = 1.96: two-sided p = 0.0500, right-sided p = 0.0250

t-statistic

Use when the sample size is small and the t-distribution is more appropriate. As degrees of freedom increase, the result converges to the Z-test.

[5]:

t_stat = 2.0
for df in [5, 20, 100]:
    p = general.calculate_pvalue_tdist(t_stat, degrees=df)
    print(f"t = {t_stat}, df = {df:3d}: two-sided p = {p:.4f}")

t = 2.0, df =   5: two-sided p = 0.1019
t = 2.0, df =  20: two-sided p = 0.0593
t = 2.0, df = 100: two-sided p = 0.0482

F-statistic

The F-statistic is often used for variance-ratio tests (e.g. ANOVA, regression). The p-value is always one-sided (right tail) because the F-distribution is defined on [0, ∞).

[6]:

f_stat = 5.0
p = general.calculate_pvalue_fdist(f_stat, degrees_numerator=2, degrees_denominator=27)
print(f"F({f_stat}) with df 2/27: p = {p:.4f}")

F(5.0) with df 2/27: p = 0.0142

Observed power

Observed power is the probability of rejecting the null hypothesis at significance level alpha, evaluated at the observed test statistic. It is mathematically equivalent to the p-value evaluated under the alternative hypothesis; so a study that achieves p = alpha has exactly 50% observed power.

Please refer to this paper for inferential problems related to (miss) use of emperical power.

Normal distribution

ObservedPower uses the standard normal distribution. compute_power() is two-sided; compute_power_above()/compute_power_below() give one-sided variants.

[7]:

op = general.ObservedPower(statistic=1.96, alpha=0.05)
print(f"Two-sided power:   {op.compute_power():.4f}")
print(f"One-sided (above): {op.compute_power_above():.4f}")
print(f"One-sided (below): {op.compute_power_below():.4f}")

Two-sided power:   0.5001
One-sided (above): 0.6237
One-sided (below): 0.9998

t-distribution

ObservedPowerT requires an additional df argument. With small degrees of freedom the power is lower than the normal-based estimate for the same statistic.

[8]:

t_stat = 2.5
for df in [10, 50, 200]:
    opt = general.ObservedPowerT(statistic=t_stat, df=df, alpha=0.05)
    print(f"df = {df:3d}: two-sided power = {opt.compute_power():.4f}")

df =  10: two-sided power = 0.6048
df =  50: two-sided power = 0.6874
df = 200: two-sided power = 0.7010

Chi-square distribution

ObservedPowerChiSquare uses the non-central chi-square distribution. Because chi-square is defined only on [0, inf), compute_power_below() raises NotImplementedError.

[9]:

opcs = general.ObservedPowerChiSquare(statistic=6.0, df=2, alpha=0.05)
print(f"Chi-square power (above): {opcs.compute_power():.4f}")

# Raise the error
try:
    opcs.compute_power_below()
except NotImplementedError as e:
    print(f"compute_power_below: {e}")

Chi-square power (above): 0.5840
compute_power_below: compute_power_below() is not meaningful for a chi-square distribution, which is only defined on [0, ∞).