API Conventions¶

Common patterns and conventions used throughout the polars-statistics API.

Expression API¶

All functions work as Polars expressions and integrate with group_by, over, and lazy evaluation:

import polars as pl
import polars_statistics as ps

# With group_by (aggregation)
df.group_by("group").agg(ps.ols("y", "x1", "x2").alias("model"))

# With over (window function)
df.with_columns(ps.ols("y", "x1", "x2").over("group").alias("model"))

# Lazy evaluation
df.lazy().group_by("group").agg(ps.ttest_ind("x", "y")).collect()

Column References¶

All functions accept column names as strings or pl.Expr:

ps.ols("y", "x1", "x2")              # String column names
ps.ols(pl.col("y"), pl.col("x1"))   # Polars expressions
ps.ols("y", pl.col("x1") * 2)       # Mixed / transformed

Return Types¶

Statistical Tests¶

Return a struct with statistic and p_value fields:

result = df.select(ps.ttest_ind("x", "y").alias("test"))
result.with_columns(
    pl.col("test").struct.field("statistic"),
    pl.col("test").struct.field("p_value"),
)

Regression Models¶

Return a struct with model-specific fields. See Output Structures for details.

result = df.group_by("group").agg(ps.ols("y", "x1").alias("model"))
result.with_columns(
    pl.col("model").struct.field("r_squared"),
    pl.col("model").struct.field("coefficients"),
)

Summary Functions¶

Return List[Struct] with coefficient statistics (like R's broom::tidy):

df.group_by("group").agg(
    ps.ols_summary("y", "x1", "x2").alias("coef")
).explode("coef").unnest("coef")
# Columns: term, estimate, std_error, statistic, p_value

Prediction Functions¶

Return Struct{prediction, lower, upper} per row:

df.with_columns(
    ps.ols_predict("y", "x1", "x2", interval="prediction", level=0.95)
        .over("group").alias("pred")
).unnest("pred")
# Columns: prediction, lower, upper

Common Parameters¶

Parameter	Description	Default
`with_intercept`	Include intercept term	`True`
`alternative`	Test alternative: "two-sided", "less", "greater"	"two-sided"
`alpha`	Significance level	`0.05`
`conf_level`	Confidence level for intervals	`0.95`
`lambda_`	L2 (Ridge) regularization strength	`0.0`