Skip to content

R Validation

polars-statistics is validated against R's reference implementations at the Rust engine level. Every statistical test, regression model, and GLM is tested against exact values produced by R scripts using set.seed() for reproducibility.


Validation Architecture

polars-statistics (Python/Polars)
    └── anofox-regression (Rust crate)  ← regression/GLM validation
    └── anofox-statistics (Rust crate)  ← statistical tests validation
            └── R scripts with set.seed(42)
            └── CSV reference data generated by R
            └── Rust tests assert against R values

The R validation lives in two upstream Rust crates:

  • anofox-regression — OLS, WLS, Ridge, Elastic Net, Quantile, GLMs (Poisson, Binomial, Gamma, Inverse Gaussian, Negative Binomial, Tweedie)
  • anofox-statistics — t-tests, ANOVA, Mann-Whitney, Wilcoxon, Kruskal-Wallis, Brunner-Munzel, Pearson/Spearman/Kendall, chi-square, Fisher, Shapiro-Wilk, and more

Reference values are generated by R scripts (Rscript R/generate_refs.R, generate_regression_validation.R, generate_glm_validation.R) and stored as CSV files or Rust constants. Rust tests then assert that computed values match R within strict tolerances.


Statistical Tests

Parametric Tests

T-Tests

Validated against R's t.test() at tolerance 1e-10.

# Welch t-test (default)
t.test(g1, g2, var.equal = FALSE)

# Student t-test
t.test(g1, g2, var.equal = TRUE)

# Paired t-test
t.test(g1, g2, paired = TRUE)

# One-sided
t.test(g1, g2, alternative = "less")
t.test(g1, g2, alternative = "greater")

# With location shift
t.test(g1, g2, mu = 0.5)
import polars as pl
import polars_statistics as ps

df = pl.DataFrame({"g1": [...], "g2": [...]})

# Welch t-test (default)
df.select(ps.ttest_ind("g1", "g2").alias("t"))

# Paired t-test
df.select(ps.ttest_paired("g1", "g2").alias("t"))

# One-sided
df.select(ps.ttest_ind("g1", "g2", alternative="less").alias("t"))

# With location shift
df.select(ps.ttest_ind("g1", "g2", mu=0.5).alias("t"))

What is validated:

Output field R field Tolerance
statistic t statistic 1e-10
df degrees of freedom 1e-10
p_value p-value (two-sided, less, greater) 1e-10
mean_x, mean_y sample means 1e-10
ci_lower, ci_upper confidence intervals (90%, 95%, 99%) 1e-6

Yuen's Trimmed Means Test

Validated against R's WRS2::yuend() at tolerance 1e-10.

library(WRS2)
yuend(g1, g2, tr = 0.2)  # 20% trim
yuend(g1, g2, tr = 0.1)  # 10% trim
df.select(ps.yuen_test("g1", "g2", trim=0.2).alias("yuen"))
df.select(ps.yuen_test("g1", "g2", trim=0.1).alias("yuen"))
Output field Tolerance
statistic 1e-10
df 1e-10
p_value (two-sided, less, greater) 1e-10
estimate (trimmed mean difference) 1e-10
ci_lower, ci_upper (95% CI) 1e-6

Brown-Forsythe Test

Validated against R's car::leveneTest(center="median") at tolerance 1e-10.

library(car)
leveneTest(y ~ group, data = df, center = "median")
df.select(ps.brown_forsythe("g1", "g2").alias("bf"))
df.select(ps.brown_forsythe("g1", "g2", "g3").alias("bf"))
Output field Tolerance
statistic (F) 1e-10
df1, df2 1e-10
p_value 1e-10

ANOVA

Validated against R's aov(), oneway.test(), and mauchly.test().

# One-way (Fisher, assumes equal variances)
oneway.test(y ~ group, data = df, var.equal = TRUE)

# One-way (Welch, no equal-variance assumption)
oneway.test(y ~ group, data = df, var.equal = FALSE)

# Two-way with interaction
aov(y ~ A * B, data = df)

# Repeated measures
aov(value ~ condition + Error(subject/condition), data = df)
mauchly.test(fit)  # sphericity
# One-way ANOVA
df.select(ps.one_way_anova("g1", "g2", "g3").alias("anova"))

# Two-way ANOVA
df.select(ps.two_way_anova("value", "factor_a", "factor_b").alias("anova"))

# Repeated measures ANOVA
df.select(ps.repeated_measures_anova("value", "condition", "subject").alias("rm"))

One-way ANOVA (Fisher) validated fields:

Output field R field Tolerance
statistic F 1e-10
df_between, df_within numerator/denominator df 1e-10
p_value p-value 1e-10
ss_between, ss_within sum of squares 1e-10
ms_between, ms_within mean squares 1e-10
grand_mean overall mean 1e-10

Algebraic identity verified: With 2 groups, F = t² (ANOVA F-statistic equals squared t-statistic).

Two-way ANOVA validates SS, df, MS, F, and p-value for Factor A, Factor B, and Interaction (Type I for balanced, Type III for unbalanced designs).

Repeated measures ANOVA additionally validates Mauchly's W (sphericity), Greenhouse-Geisser epsilon, and Huynh-Feldt epsilon.


Non-Parametric Tests

Mann-Whitney U

Validated against R's wilcox.test() at tolerance 1e-10 (statistic) and 1e-6 (p-value).

# Asymptotic
wilcox.test(x, y, alternative = "two.sided", correct = FALSE)
wilcox.test(x, y, alternative = "two.sided", correct = TRUE)

# Exact (small samples)
wilcox.test(x, y, exact = TRUE, conf.int = TRUE)
df.select(ps.mann_whitney_u("x", "y").alias("mwu"))
df.select(ps.mann_whitney_u("x", "y", alternative="less").alias("mwu"))
Output field Tolerance
statistic (W) 1e-10
p_value (two-sided, less, greater) 1e-6
p_value with continuity correction 1e-6
estimate (Hodges-Lehmann, exact mode) 1e-10
ci_lower, ci_upper (exact, 90%/95%) 1e-10

Wilcoxon Signed-Rank

Validated against R's wilcox.test(paired=TRUE) at tolerance 1e-10 / 1e-6.

wilcox.test(x, y, paired = TRUE, alternative = "two.sided")
wilcox.test(x, y, paired = TRUE, exact = TRUE, conf.int = TRUE)
df.select(ps.wilcoxon_signed_rank("x", "y").alias("wsr"))

Kruskal-Wallis

Validated against R's kruskal.test() at tolerance 1e-10 (statistic) / 1e-6 (p-value).

kruskal.test(list(a, b, c))
df.select(ps.kruskal_wallis("a", "b", "c").alias("kw"))

Brunner-Munzel

Validated against R's brunnermunzel::brunnermunzel.test() at tolerance 1e-6.

library(brunnermunzel)
brunnermunzel.test(x, y, alpha = 0.05)
df.select(ps.brunner_munzel("x", "y").alias("bm"))
Output field Tolerance
statistic 1e-6
df 1e-6
p_value (two-sided, less, greater) 1e-6
estimate (P(X < Y) win probability) 1e-6
ci_lower, ci_upper (90%, 95%, 99%) 1e-6

Correlation

Pearson

Validated against R's cor.test(method="pearson") at tolerance 1e-10.

cor.test(x, y, method = "pearson")
df.select(ps.pearson("x", "y").alias("r"))
Output field Tolerance
estimate (r) 1e-10
statistic (t) 1e-10
df 1e-10
p_value 1e-6
ci_lower, ci_upper (90%, 95%, 99%) 1e-4

Special cases validated: perfect positive (r=1), perfect negative (r=-1), zero correlation.


Spearman

Validated against R's cor.test(method="spearman") at tolerance 1e-10 (rho).

cor.test(x, y, method = "spearman")
df.select(ps.spearman("x", "y").alias("rho"))

Special cases validated: perfect, negative, zero correlation, tied ranks.


Kendall

Validated against R's cor.test(method="kendall") at tolerance 1e-4 (tau).

cor.test(x, y, method = "kendall")
df.select(ps.kendall("x", "y").alias("tau"))

Categorical Tests

Chi-Square Test

Validated against R's chisq.test() at tolerance 1e-10.

# Test of independence (2x2)
chisq.test(matrix(c(10, 20, 30, 40), nrow = 2), correct = FALSE)

# With Yates' correction
chisq.test(matrix(c(10, 20, 30, 40), nrow = 2), correct = TRUE)

# Goodness of fit (uniform)
chisq.test(c(20, 25, 22, 18, 15))

# Goodness of fit (custom proportions)
chisq.test(c(50, 30, 20), p = c(0.5, 0.3, 0.2))
# Test of independence
df.select(ps.chisq_test("counts", n_rows=2, n_cols=2).alias("chi"))

# Goodness of fit
df.select(ps.chisq_goodness_of_fit("observed").alias("gof"))

Fisher's Exact Test

Validated against R's fisher.test() at tolerance 0.01 (p-value).

fisher.test(matrix(c(3, 1, 1, 3), nrow = 2))
pl.select(ps.fisher_exact(a=3, b=1, c=1, d=3).alias("f"))

Effect Size Measures

Measure R Package Tolerance
Cramer's V rcompanion::cramerV() 1e-4
Phi coefficient psych::phi() 1e-4
Contingency coefficient vcd::assocstats() 1e-4
Cohen's kappa irr::kappa2() 0.01

Cohen's kappa example (manually verified):

Confusion matrix:  [[20, 5, 0], [10, 30, 5], [0, 5, 25]]
Po = (20+30+25)/100 = 0.75
Pe = (35×30 + 45×40 + 20×30) / 10000 = 0.345
kappa = (0.75 - 0.345) / (1 - 0.345) = 0.618

Distributional Tests

Shapiro-Wilk

Validated against R's shapiro.test().

shapiro.test(x)
df.select(ps.shapiro_wilk("x").alias("sw"))
Scenario W tolerance p-value tolerance
Normal data (small/medium/large n) 1e-3 0.05
Non-normal (uniform, exponential) 1e-3 0.05
Edge cases (n=3) 1e-3 0.05
Edge cases (n=4, n=5) 0.02 directional only
Constant data 1e-10 (W=1.0) 1e-10 (p=1.0)

Tolerance rationale

Shapiro-Wilk has known implementation variations across software packages. The W statistic typically matches to 3-4 decimal places between R and other implementations. polars-statistics uses the same algorithm but may differ slightly in coefficient tables.


D'Agostino K-Squared

Validated against R's moments::agostino.test().

Scenario z-score tolerance p-value
Normal data 0.15 > 0.05 (not significant)
Skewed data 0.15 < 0.05 (significant)

Regression

OLS and WLS

Weighted Least Squares

Validated against R's lm() with weights at tolerance 0.01 (relative).

set.seed(42)
x <- 1:20
y <- 3 + 1.8 * x + rnorm(20, sd = x)  # heteroskedastic
w <- 1 / x^2                           # inverse-variance weights
fit <- lm(y ~ x, weights = w)
summary(fit)
df.select(
    ps.wls("y", "x", weights="w").alias("model")
)
Output field R value Tolerance
intercept 3.1055 0.01
coefficients[0] 1.7870 0.01
se_intercept 0.6773 0.01
se_coefficients[0] 0.1913 0.01
r_squared 0.8289 0.01

HC Standard Errors

Validated against R's sandwich::vcovHC() and lmtest::coeftest() at tolerance 0.01 (relative).

library(sandwich)
library(lmtest)

x <- c(1, 2, 3, 4, 5)
y <- c(2.1, 4.3, 5.8, 8.2, 9.9)
m <- lm(y ~ x)

# HC0 through HC3
sqrt(diag(vcovHC(m, type = "HC0")))  # [0.12237, 0.03178]
sqrt(diag(vcovHC(m, type = "HC1")))  # [0.15796, 0.04103]
sqrt(diag(vcovHC(m, type = "HC2")))  # [0.15489, 0.04185]
sqrt(diag(vcovHC(m, type = "HC3")))  # [0.20343, 0.05722]
df.select(
    ps.ols_summary("y", "x", hc_type="hc0").alias("coef"),
    # Also: hc_type="hc1", "hc2", "hc3"
)

Exact R reference values (with-intercept model):

HC Type Intercept SE Slope SE
HC0 0.12237 0.03178
HC1 0.15796 0.04103
HC2 0.15489 0.04185
HC3 0.20343 0.05722

No-intercept model (validated separately):

HC Type Slope SE
HC0 0.02377
HC1 0.02658
HC2 0.02817
HC3 0.03408

Properties verified:

  • HC3 >= HC2 >= HC0 ordering for standard errors
  • t-statistic = coefficient / SE identity (tolerance 0.001)
  • p-values in [0, 1]
  • Confidence intervals bracket the coefficient

Ridge Regression

Validated against R's glmnet with alpha=0 at tolerance 0.01 (relative).

library(glmnet)
set.seed(42)

n <- 50; p <- 3
X <- matrix(rnorm(n * p), n, p)
beta <- c(1.5, -2.0, 0.5)
y <- X %*% beta + rnorm(n)

fit <- glmnet(X, y, alpha = 0, lambda = 0.5,
              standardize = FALSE, intercept = TRUE)
coef(fit)
df.select(
    ps.ridge("y", "x0", "x1", "x2", lambda_=0.5).alias("model")
)
Output field R value True value
intercept -0.0145 0.0
coefficients[0] 1.5357 1.5
coefficients[1] -2.0012 -2.0
coefficients[2] 0.4734 0.5

Elastic Net

Validated against R's glmnet with alpha=0.5 at tolerance 0.2 (relative).

fit <- glmnet(X, y, alpha = 0.5, lambda = 0.3,
              standardize = FALSE, intercept = TRUE)
coef(fit)
df.select(
    ps.elastic_net("y", "x0", "x1", "x2",
                   lambda_=0.3, alpha=0.5).alias("model")
)
Output field R value True value
intercept -0.0339 0.0
coefficients[0] 1.4658 1.5
coefficients[1] -1.9413 -2.0
coefficients[2] 0.4016 0.5

Elastic Net tolerance

Elastic Net uses coordinate descent, and implementations may vary slightly in convergence criteria. A relative tolerance of 0.2 accommodates these differences.


Quantile Regression

Validated against R's quantreg::rq() (version 6.1).

library(quantreg)
set.seed(42)

x <- seq(1, 10, length.out = 50)
y <- 3 + 1.2 * x + rnorm(50, sd = x * 0.5)

rq(y ~ x, tau = 0.5)   # median
rq(y ~ x, tau = 0.25)  # lower quartile
rq(y ~ x, tau = 0.75)  # upper quartile
df.select(
    ps.quantile("y", "x", tau=0.50).alias("q50"),
    ps.quantile("y", "x", tau=0.25).alias("q25"),
    ps.quantile("y", "x", tau=0.75).alias("q75"),
)

Median regression (tau=0.5), n=50:

Output field R value Tolerance
intercept 2.6995 1.0
coefficients[0] 1.2375 0.2

Multiple quantiles, n=100:

Tau R intercept R slope Slope tolerance
0.25 -0.3861 1.9995 0.2
0.50 0.8117 2.0102 0.2
0.75 1.5250 2.2169 0.3

Quantile regression tolerance

Quantile regression solutions are not unique when the number of observations on the solution hyperplane exceeds the number of parameters. Different simplex implementations may find different valid solutions, so intercept tolerance is wider (1.0) while slope tolerance is tighter (0.2).


Prediction and Confidence Intervals

Validated against R's predict.lm() with interval="prediction" and interval="confidence".

m <- lm(y ~ x)
predict(m, newdata = data.frame(x = 100),
        interval = "prediction", level = 0.95)
predict(m, newdata = data.frame(x = 100),
        interval = "confidence", level = 0.95)
df.with_columns(
    ps.ols_predict("y", "x", interval="prediction", level=0.95).alias("pred")
).unnest("pred")

Large extrapolation test (n=20, predicting at x=100, 500, 1000):

x Fitted PI lower PI upper Tolerance (fit) Tolerance (bounds)
100 205.578 200.856 210.301 0.1 0.2
500 1007.984 983.179 1032.790 0.1 0.5
1000 2010.992 1960.904 2061.080 0.1 1.0

Multiple confidence levels validated at x=35 (fit=80.159):

Level PI lower PI upper
80% 78.916 81.401
90% 78.548 81.769
95% 78.219 82.098
99% 77.542 82.775

Generalized Linear Models

Poisson GLM

Validated against R's glm(family=poisson()).

set.seed(42)
x <- seq(0.1, 3.0, length.out = 30)
y <- rpois(30, lambda = exp(0.8 + 0.7 * x))
fit <- glm(y ~ x, family = poisson(link = "log"))
summary(fit)
x <- 1:20
y <- rpois(20, lambda = 4 + 3 * x)
fit <- glm(y ~ x, family = poisson(link = "identity"))
# Log link (default)
df.select(ps.poisson("y", "x").alias("model"))

# Identity link
df.select(ps.poisson("y", "x", link="identity").alias("model"))

Poisson log link (n=30):

Output field R value Tolerance
intercept 0.7992 0.01
coefficients[0] 0.7108 0.01
deviance 28.755 0.1
se_intercept 0.1813 0.01
se_coefficients[0] 0.0828 0.01

Poisson identity link (n=20):

Output field R value Tolerance
intercept 4.0250 0.01
coefficients[0] 3.0167 0.01
deviance 2.063 0.1

Binomial GLM

Validated against R's glm(family=binomial()) across three link functions.

set.seed(42)
x <- seq(-3, 3, length.out = 30)
p <- plogis(0 + 0.8 * x)
y <- rbinom(30, 1, p)
fit <- glm(y ~ x, family = binomial(link = "logit"))
summary(fit)
fit <- glm(y ~ x, family = binomial(link = "probit"))
fit <- glm(y ~ x, family = binomial(link = "cloglog"))

Logit link (n=30):

Output field R value Tolerance
intercept 0.0000 0.1
coefficients[0] 0.8483 0.1
deviance 30.110 0.5
se_intercept 0.4502 0.05
se_coefficients[0] 0.3062 0.05

Probit link (n=30):

Output field R value Tolerance
intercept 0.3428 0.1
coefficients[0] 0.9199 0.1
deviance 18.956 0.5

Negative Binomial GLM

Validated against R's MASS::glm.nb().

library(MASS)
set.seed(42)
x <- seq(0.5, 3.0, length.out = 30)
y <- rnbinom(30, size = 5, mu = exp(1.0 + 0.05 * x))
fit <- glm.nb(y ~ x)
summary(fit)
df.select(ps.negative_binomial("y", "x").alias("model"))
Output field R value Tolerance
intercept 1.0157 0.05
coefficients[0] 0.0532 0.05
theta 5.5678 sign/positivity check

Gamma GLM

Validated against R's glm(family=Gamma(link="log")).

set.seed(42)
x <- seq(0.5, 3.0, length.out = 30)
y <- rgamma(30, shape = 2, rate = 2 / exp(0.6 + 0.3 * x))
fit <- glm(y ~ x, family = Gamma(link = "log"))
summary(fit)
df.select(ps.gamma_glm("y", "x").alias("model"))
Output field R value Tolerance
intercept 0.6335 0.1
coefficients[0] 0.2899 0.1
deviance 19.153 1.0
se_intercept 0.3433 0.05
se_coefficients[0] 0.1805 0.05

Inverse Gaussian GLM

Validated against R's glm(family=inverse.gaussian(link="log")).

Output field R value Tolerance
intercept 0.5425 0.2
coefficients[0] 0.0365 0.2
deviance 19.642 2.0

Poisson with Offset

Validated against R's glm() with offset term for exposure adjustment.

fit <- glm(y ~ x + offset(log(exposure)),
           family = poisson(link = "log"))
df.select(
    ps.poisson("y", "x", offset="log_exposure").alias("model")
)
Output field R value Tolerance
intercept 0.2382 0.05
coefficients[0] 0.1003 0.01
deviance 21.571 0.5

Tweedie GLMs

Validated against R's statmod::tweedie() family across four variance powers.

library(statmod)
fit <- glm(y ~ x, family = tweedie(var.power = P, link.power = 0))
Variance Power Distribution Intercept Slope Deviance
1.0 Poisson 0.7992 0.7108 28.755
1.5 Compound Poisson-Gamma 0.1820 0.2193 47.805
2.0 Gamma 0.6335 0.2899 19.153
3.0 Inverse Gaussian 0.5425 0.0365 19.642

TOST Equivalence Tests

Validated against R's TOSTER package (TOSTone(), TOSTtwo(), TOSTpaired(), TOSTr(), TOSTtwo.prop(), wilcox_TOST(), boot_t_TOST()).

Tests verify both equivalence detection and non-equivalence detection:

Scenario Bound Expected result
One-sample, mean near 0 ±0.5 equivalent = true
One-sample, mean near 2 ±0.5 equivalent = false
Two-sample, similar means ±0.5 equivalent = true
Two-sample, different means ±0.5 equivalent = false
Proportion, 52/100 vs 0.5 ±0.1 estimate = 0.02 (exact)
Cohen's d bounds d = 0.5 bounds computed correctly
Bootstrap TOST ±0.5 reproducible with same seed

R Packages Used

The full list of R packages referenced in validation:

R Package What it validates
stats (base R) t.test(), aov(), oneway.test(), lm(), glm(), predict(), shapiro.test(), chisq.test(), fisher.test(), cor.test(), wilcox.test(), kruskal.test()
sandwich HC0-HC3 heteroskedasticity-consistent standard errors
lmtest coeftest() for robust inference
glmnet Ridge (alpha=0) and Elastic Net regression
quantreg Quantile regression (rq())
MASS Negative Binomial GLM (glm.nb())
statmod Tweedie GLM family
car Brown-Forsythe / Levene's test
WRS2 Yuen's trimmed means test
brunnermunzel Brunner-Munzel test
TOSTER TOST equivalence tests
rcompanion Cramer's V
vcd Contingency coefficient
psych Phi coefficient, Cohen's kappa
irr Cohen's kappa
moments D'Agostino K-squared test
coin Permutation tests

Summary

Category Tests validated Typical tolerance
Parametric tests t-tests (Welch, Student, paired), Yuen, Brown-Forsythe, ANOVA (one-way, two-way, repeated measures, Welch) 1e-10
Non-parametric tests Mann-Whitney U, Wilcoxon signed-rank, Kruskal-Wallis, Brunner-Munzel 1e-6 to 1e-10
Correlation Pearson, Spearman, Kendall 1e-4 to 1e-10
Categorical tests Chi-square, Fisher exact, Cramer's V, Phi, Contingency coefficient, Cohen's kappa, G-test 1e-4 to 1e-10
Distributional tests Shapiro-Wilk, D'Agostino K-squared 1e-3
Linear regression OLS, WLS, Ridge, Elastic Net, Quantile 0.01 to 0.2
HC standard errors HC0, HC1, HC2, HC3 0.01
Prediction intervals Prediction and confidence intervals at multiple levels 0.01 to 1.0
GLMs Poisson (log/identity/sqrt), Binomial (logit/probit/cloglog), Gamma, Inverse Gaussian, Negative Binomial, Tweedie (4 powers) 0.01 to 0.5
Equivalence tests TOST one-sample, two-sample, paired, proportion, correlation, Wilcoxon, bootstrap 1e-6