Skip to content

Regularized & Specialized Linear Models

Beyond ordinary least squares, real-world data often calls for regularized or constrained estimators. Correlated predictors inflate OLS coefficient variance, heteroskedastic errors violate constant-variance assumptions, and streaming data demands online updates. This page walks through Lasso, Ridge, Elastic Net, Weighted Least Squares, Recursive Least Squares, Bounded/Non-Negative regression, and Isotonic regression — all as native Polars expressions.

Setup

Energy consumption prediction for commercial buildings with correlated features (square footage, insulation rating, and temperature are all interrelated):

import polars as pl
import polars_statistics as ps

df = pl.DataFrame({
    "kwh": [245, 312, 198, 367, 289, 401, 176, 334, 256, 423,
            210, 345, 278, 389, 301, 412, 189, 356, 267, 434,
            223, 323, 234, 378, 295, 398, 184, 345, 262, 445,
            215, 330, 247, 385, 305, 420, 192, 362, 274, 440,
            228, 318, 241, 372, 290, 405, 180, 350, 258, 430,
            220, 338, 252, 392, 298, 415, 195, 358, 270, 442],
    "temp": [32, 28, 35, 25, 30, 22, 38, 26, 33, 20,
             36, 27, 31, 24, 29, 21, 37, 25, 32, 19,
             34, 28, 35, 24, 30, 23, 38, 26, 33, 18,
             36, 27, 34, 23, 29, 21, 37, 25, 31, 19,
             33, 28, 35, 24, 30, 22, 39, 26, 33, 20,
             34, 27, 34, 23, 29, 21, 37, 25, 32, 18],
    "humidity": [45, 55, 40, 60, 50, 65, 35, 58, 42, 70,
                 38, 57, 48, 62, 52, 68, 36, 59, 44, 72,
                 43, 54, 39, 61, 49, 64, 34, 56, 41, 73,
                 37, 55, 46, 63, 51, 67, 35, 58, 43, 71,
                 44, 53, 40, 60, 50, 66, 33, 57, 42, 70,
                 42, 56, 47, 64, 52, 69, 36, 59, 44, 74],
    "occupancy": [4, 3, 5, 2, 4, 2, 6, 3, 4, 1,
                  5, 3, 4, 2, 3, 1, 6, 2, 4, 1,
                  5, 3, 5, 2, 4, 2, 6, 3, 4, 1,
                  5, 3, 5, 2, 3, 1, 6, 2, 4, 1,
                  4, 3, 5, 2, 4, 2, 6, 3, 4, 1,
                  5, 3, 4, 2, 3, 1, 6, 2, 4, 1],
    "sqft": [1200, 1800, 900, 2200, 1500, 2500, 800, 2000, 1300, 2800,
             950, 2100, 1400, 2300, 1600, 2600, 850, 2100, 1350, 2900,
             1100, 1900, 1000, 2200, 1500, 2400, 780, 2000, 1250, 3000,
             1000, 1850, 1150, 2250, 1550, 2700, 870, 2150, 1400, 2850,
             1150, 1800, 1050, 2150, 1500, 2500, 750, 2050, 1300, 2750,
             1050, 1950, 1200, 2350, 1550, 2650, 880, 2100, 1350, 2950],
    "insulation": [3, 5, 2, 7, 4, 8, 1, 6, 3, 9,
                   2, 6, 4, 7, 5, 8, 1, 6, 3, 9,
                   3, 5, 2, 7, 4, 8, 1, 6, 3, 10,
                   2, 5, 3, 7, 5, 9, 1, 6, 4, 9,
                   3, 5, 2, 7, 4, 8, 1, 6, 3, 9,
                   2, 5, 3, 7, 5, 8, 1, 6, 3, 10],
})

Cast to Float64 and add a weight column for Weighted Least Squares:

df = df.cast({c: pl.Float64 for c in df.columns})

# Weight column for WLS (inverse variance — larger units have more stable consumption)
df = df.with_columns(
    (1.0 / (pl.col("sqft") / 1000.0)).alias("weight")
)

Lasso for Feature Selection

Lasso (L1 regularization) shrinks some coefficients exactly to zero, performing automatic feature selection. With correlated predictors like sqft and insulation, Lasso picks one and drops the other:

result = df.select(
    ps.lasso("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
             lambda_=0.1).alias("model")
)

model = result["model"][0]
print(f"R²:           {model['r_squared']:.4f}")
print(f"RMSE:         {model['rmse']:.2f}")
print(f"Intercept:    {model['intercept']:.4f}")
print(f"Coefficients: {model['coefficients']}")

Expected output:

R²:           0.7697
RMSE:         39.92
Intercept:    502.5261
Coefficients: [-5.611, -0.168, -7.0765, 0.0, 0.2145]

The sqft coefficient is zero -- Lasso eliminated it as redundant given the other predictors (especially insulation, which is highly correlated with building size).

Coefficient Table

lasso_coefs = (
    df.select(
        ps.lasso_summary("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
                         lambda_=0.1).alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

print(lasso_coefs)
# ┌───────────┬───────────┬───────────┬───────────┬─────────┐
# │ term      ┆ estimate  ┆ std_error ┆ statistic ┆ p_value │
# ╞═══════════╪═══════════╪═══════════╪═══════════╪═════════╡
# │ intercept ┆ 502.5261  ┆ NaN       ┆ NaN       ┆ NaN     │
# │ x1        ┆ -5.6110   ┆ NaN       ┆ NaN       ┆ NaN     │
# │ x2        ┆ -0.1680   ┆ NaN       ┆ NaN       ┆ NaN     │
# │ x3        ┆ -7.0765   ┆ NaN       ┆ NaN       ┆ NaN     │
# │ x4        ┆ 0.0000    ┆ NaN       ┆ NaN       ┆ NaN     │
# │ x5        ┆ 0.2145    ┆ NaN       ┆ NaN       ┆ NaN     │
# └───────────┴───────────┴───────────┴───────────┴─────────┘

Standard errors and p-values are NaN for Lasso -- the L1 penalty makes the usual OLS standard errors invalid.

Predictions with Intervals

lasso_preds = (
    df.with_columns(
        ps.lasso_predict("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
                         lambda_=0.1, interval="prediction").alias("pred")
    )
    .unnest("pred")
)

print(lasso_preds.select("kwh", "lasso_prediction", "lasso_lower", "lasso_upper").head(5))
# ┌───────┬──────────────────┬─────────────┬─────────────┐
# │ kwh   ┆ lasso_prediction ┆ lasso_lower ┆ lasso_upper │
# ╞═══════╪══════════════════╪═════════════╪═════════════╡
# │ 245.0 ┆ 287.75           ┆ 203.76      ┆ 371.75      │
# │ 312.0 ┆ 316.02           ┆ 233.93      ┆ 398.12      │
# │ 198.0 ┆ 264.47           ┆ 180.94      ┆ 348.00      │
# │ 367.0 ┆ 339.52           ┆ 256.36      ┆ 422.69      │
# │ 289.0 ┆ 298.35           ┆ 215.59      ┆ 381.11      │
# └───────┴──────────────────┴─────────────┴─────────────┘

Formula Syntax

lasso_f = df.select(
    ps.lasso_formula("kwh ~ temp + humidity + occupancy + sqft + insulation",
                     lambda_=0.1).alias("model")
)

model_f = lasso_f["model"][0]
print(f"R² (formula): {model_f['r_squared']:.4f}")

# Formula summary and predict work the same way
lasso_f_coefs = (
    df.select(
        ps.lasso_formula_summary("kwh ~ temp + humidity + occupancy + sqft + insulation",
                                 lambda_=0.1).alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

lasso_f_preds = (
    df.with_columns(
        ps.lasso_formula_predict("kwh ~ temp + humidity + occupancy + sqft + insulation",
                                 lambda_=0.1, interval="prediction").alias("pred")
    )
    .unnest("pred")
)

Expected output:

R² (formula): 0.7697

Lasso coefficient shrinkage

Plot code
import matplotlib.pyplot as plt
import numpy as np

terms = ["temp", "humidity", "occupancy", "sqft", "insulation"]
coefs = model["coefficients"]

fig, ax = plt.subplots(figsize=(7, 4))
colors = ["#4C72B0" if c != 0 else "#C44E52" for c in coefs]
y_pos = np.arange(len(terms))
ax.barh(y_pos, coefs, color=colors, height=0.5)
ax.set_yticks(y_pos)
ax.set_yticklabels(terms)
ax.axvline(0, color="#999", ls="--", lw=1)
ax.set_xlabel("Coefficient Value")
ax.set_title("Lasso Coefficients (red = shrunk to zero)")
ax.invert_yaxis()
plt.tight_layout()
plt.savefig("reg2_lasso_coefs.png", dpi=150)

Weighted Least Squares

When observation variance is not constant, WLS down-weights noisy observations. Here, larger buildings have more stable energy consumption, so we weight by inverse floor area:

result = df.select(
    ps.wls("kwh", "weight", "temp", "humidity", "sqft").alias("model")
)

model = result["model"][0]
print(f"R²:           {model['r_squared']:.4f}")
print(f"RMSE:         {model['rmse']:.2f}")
print(f"Intercept:    {model['intercept']:.4f}")
print(f"Coefficients: {model['coefficients']}")

Expected output:

R²:           0.9886
RMSE:         6.97
Intercept:    413.7000
Coefficients: [-7.0846, -0.262, 0.0649]

Coefficient Table

wls_coefs = (
    df.select(
        ps.wls_summary("kwh", "weight", "temp", "humidity", "sqft").alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

print(wls_coefs)
# ┌───────────┬───────────┬───────────┬───────────┬──────────┐
# │ term      ┆ estimate  ┆ std_error ┆ statistic ┆ p_value  │
# ╞═══════════╪═══════════╪═══════════╪═══════════╪══════════╡
# │ intercept ┆ 413.7000  ┆ 95.41     ┆ 4.34      ┆ 0.000061 │
# │ x1        ┆ -7.0846   ┆ 1.90      ┆ -3.72     ┆ 0.000461 │
# │ x2        ┆ -0.2620   ┆ 0.77      ┆ -0.34     ┆ 0.735667 │
# │ x3        ┆ 0.0649    ┆ 0.01      ┆ 4.57      ┆ 0.000027 │
# └───────────┴───────────┴───────────┴───────────┴──────────┘

Temperature and square footage are highly significant; humidity is not (p = 0.74).

Predictions

wls_preds = (
    df.with_columns(
        ps.wls_predict("kwh", "weight", "temp", "humidity", "sqft",
                       interval="prediction").alias("pred")
    )
    .unnest("pred")
)

print(wls_preds.select("kwh", "wls_prediction", "wls_lower", "wls_upper").head(5))
# ┌───────┬────────────────┬───────────┬───────────┐
# │ kwh   ┆ wls_prediction ┆ wls_lower ┆ wls_upper │
# ╞═══════╪════════════════╪═══════════╪═══════════╡
# │ 245.0 ┆ 253.05         ┆ 238.17    ┆ 267.94    │
# │ 312.0 ┆ 317.70         ┆ 303.34    ┆ 332.05    │
# │ 198.0 ┆ 213.65         ┆ 198.85    ┆ 228.44    │
# │ 367.0 ┆ 363.59         ┆ 349.14    ┆ 378.04    │
# │ 289.0 ┆ 285.38         ┆ 271.00    ┆ 299.75    │
# └───────┴────────────────┴───────────┴───────────┘

Formula Syntax

wls_f = df.select(
    ps.wls_formula("kwh ~ temp + humidity + sqft", weights="weight").alias("model")
)

model_f = wls_f["model"][0]
print(f"R² (formula): {model_f['r_squared']:.4f}")

Expected output:

R² (formula): 0.9886

WLS residuals

Plot code
import matplotlib.pyplot as plt

residuals = wls_preds["kwh"] - wls_preds["wls_prediction"]
fig, ax = plt.subplots(figsize=(6, 4.5))
ax.scatter(wls_preds["wls_prediction"], residuals,
           s=50, color="#4C72B0", edgecolor="white")
ax.axhline(0, color="#C44E52", ls="--", lw=1.5)
ax.set_xlabel("WLS Fitted Values")
ax.set_ylabel("Residuals")
ax.set_title("WLS Residuals vs Fitted")
plt.tight_layout()
plt.savefig("reg2_wls_residuals.png", dpi=150)

Recursive Least Squares

RLS updates coefficients one observation at a time, useful for streaming data or detecting parameter drift. The forgetting_factor (0 < ff <= 1) controls how quickly old observations are down-weighted:

result = df.select(
    ps.rls("kwh", "temp", "sqft", forgetting_factor=0.98).alias("model")
)

model = result["model"][0]
print(f"R²:           {model['r_squared']:.4f}")
print(f"RMSE:         {model['rmse']:.2f}")
print(f"Intercept:    {model['intercept']:.4f}")
print(f"Coefficients: {model['coefficients']}")

Expected output:

R²:           0.9902
RMSE:         8.07
Intercept:    439.6408
Coefficients: [-7.6797, 0.0521]

Coefficient Table

rls_coefs = (
    df.select(
        ps.rls_summary("kwh", "temp", "sqft", forgetting_factor=0.98).alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

print(rls_coefs)
# ┌───────────┬───────────┬───────────┬───────────┬─────────┐
# │ term      ┆ estimate  ┆ std_error ┆ statistic ┆ p_value │
# ╞═══════════╪═══════════╪═══════════╪═══════════╪═════════╡
# │ intercept ┆ 439.6408  ┆ NaN       ┆ NaN       ┆ NaN     │
# │ x1        ┆ -7.6797   ┆ NaN       ┆ NaN       ┆ NaN     │
# │ x2        ┆ 0.0521    ┆ NaN       ┆ NaN       ┆ NaN     │
# └───────────┴───────────┴───────────┴───────────┴─────────┘

Standard errors are NaN for RLS because the recursive estimator does not produce the usual OLS variance-covariance matrix.

Predictions

rls_preds = (
    df.with_columns(
        ps.rls_predict("kwh", "temp", "sqft", forgetting_factor=0.98).alias("pred")
    )
    .unnest("pred")
)

print(rls_preds.select("kwh", "rls_prediction").head(5))
# ┌───────┬────────────────┐
# │ kwh   ┆ rls_prediction │
# ╞═══════╪════════════════╡
# │ 245.0 ┆ 256.45         │
# │ 312.0 ┆ 318.44         │
# │ 198.0 ┆ 217.77         │
# │ 367.0 ┆ 362.33         │
# │ 289.0 ┆ 287.44         │
# └───────┴────────────────┘

Formula Syntax

rls_f = df.select(
    ps.rls_formula("kwh ~ temp + sqft", forgetting_factor=0.98).alias("model")
)

model_f = rls_f["model"][0]
print(f"R² (formula): {model_f['r_squared']:.4f}")

Expected output:

R² (formula): 0.9902

Expanding OLS

When forgetting_factor=1.0, RLS is equivalent to expanding-window OLS -- all past observations are weighted equally. The expanding_ols function is a convenient shorthand:

result = df.select(
    ps.expanding_ols("kwh", "temp", "sqft").alias("model")
)

model = result["model"][0]
print(f"R² (expanding): {model['r_squared']:.4f}")

Expected output:

R² (expanding): 0.9903

Bounded & Non-Negative Regression

Bounded Least Squares

BLS constrains coefficients to lie within a specified range. This is useful when domain knowledge dictates that effects must be bounded:

result = df.select(
    ps.bls("kwh", "temp", "sqft",
           lower_bound=-10.0, upper_bound=1.0).alias("model")
)

model = result["model"][0]
print(f"R²:           {model['r_squared']:.4f}")
print(f"RMSE:         {model['rmse']:.2f}")
print(f"Intercept:    {model['intercept']:.4f}")
print(f"Coefficients: {model['coefficients']}")

Expected output:

R²:           0.8270
RMSE:         33.99
Intercept:    0.0000
Coefficients: [1.0, 0.1708]

The intercept and temperature coefficient hit the bounds, showing the constraint is active.

bls_coefs = (
    df.select(
        ps.bls_summary("kwh", "temp", "sqft",
                       lower_bound=-10.0, upper_bound=1.0).alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

bls_preds = (
    df.with_columns(
        ps.bls_predict("kwh", "temp", "sqft",
                       lower_bound=-10.0, upper_bound=1.0).alias("pred")
    )
    .unnest("pred")
)

print(bls_preds.select("kwh", "bls_prediction").head(5))
# ┌───────┬────────────────┐
# │ kwh   ┆ bls_prediction │
# ╞═══════╪════════════════╡
# │ 245.0 ┆ 237.0          │
# │ 312.0 ┆ 335.5          │
# │ 198.0 ┆ 188.7          │
# │ 367.0 ┆ 400.8          │
# │ 289.0 ┆ 286.2          │
# └───────┴────────────────┘

Formula Syntax

bls_f = df.select(
    ps.bls_formula("kwh ~ temp + sqft",
                   lower_bound=-10.0, upper_bound=1.0).alias("model")
)

Non-Negative Least Squares

NNLS constrains all coefficients to be non-negative (equivalent to bls with lower_bound=0). This makes sense when you know all predictor effects should be positive:

result = df.select(
    ps.nnls("kwh", "sqft", "insulation").alias("model")
)

model = result["model"][0]
print(f"R²:           {model['r_squared']:.4f}")
print(f"Coefficients: {model['coefficients']}")

Expected output:

R²:           0.0000
Coefficients: [0.0, 0.0]

Non-negative constraint conflict

R² = 0 indicates the non-negative constraint is too restrictive here. The dominant predictor (temperature) has a negative relationship with energy consumption (colder weather increases heating demand), but NNLS forces all coefficients non-negative. Use NNLS only when the positive-coefficient assumption is valid for your features.

nnls_preds = (
    df.with_columns(
        ps.nnls_predict("kwh", "sqft", "insulation").alias("pred")
    )
    .unnest("pred")
)

nnls_f = df.select(
    ps.nnls_formula("kwh ~ sqft + insulation").alias("model")
)

Isotonic Regression

Isotonic regression fits a monotone (non-decreasing or non-increasing) step function. It is non-parametric and makes no assumptions about functional form -- only that the relationship is monotonic:

result = df.select(
    ps.isotonic("kwh", "sqft").alias("model")
)

model = result["model"][0]
print(f"R²:        {model['r_squared']:.4f}")
print(f"Increasing: {model['increasing']}")
print(f"First 5 fitted: {model['fitted'][:5]}")

Expected output:

R²:        0.9971
Increasing: True
First 5 fitted: [248.5, 315.0, 198.0, 372.5, 291.33]

The near-perfect R² reflects isotonic regression's flexibility -- it can fit any monotone pattern in the data.

Isotonic fit

Plot code
import matplotlib.pyplot as plt
import numpy as np

sqft = df["sqft"].to_numpy()
kwh = df["kwh"].to_numpy()
fitted = model["fitted"]

order = np.argsort(sqft)
fig, ax = plt.subplots(figsize=(7, 5))
ax.scatter(sqft, kwh, s=40, color="#4C72B0", edgecolor="white",
           alpha=0.7, label="Observed")
ax.step(sqft[order], [fitted[i] for i in order],
        where="post", color="#C44E52", lw=2, label="Isotonic fit")
ax.set_xlabel("Square Footage")
ax.set_ylabel("Energy Consumption (kWh)")
ax.legend()
plt.tight_layout()
plt.savefig("reg2_isotonic_fit.png", dpi=150)

Ridge & Elastic Net Variants

Ridge Regression

Ridge (L2 regularization) shrinks coefficients toward zero without setting any exactly to zero. It handles multicollinearity well and always retains all predictors:

result = df.select(
    ps.ridge("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
             lambda_=5.0).alias("model")
)

model = result["model"][0]
print(f"R²:           {model['r_squared']:.4f}")
print(f"RMSE:         {model['rmse']:.2f}")
print(f"Intercept:    {model['intercept']:.4f}")
print(f"Coefficients: {model['coefficients']}")

Expected output:

R²:           0.9909
RMSE:         7.99
Intercept:    356.5200
Coefficients: [-5.0626, 0.0373, -4.553, 0.0621, 0.7414]

Coefficient Table

ridge_coefs = (
    df.select(
        ps.ridge_summary("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
                         lambda_=5.0).alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

print(ridge_coefs)
# ┌───────────┬───────────┬───────────┬───────────┬───────────┐
# │ term      ┆ estimate  ┆ std_error ┆ statistic ┆ p_value   │
# ╞═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
# │ intercept ┆ 356.5200  ┆ 1.03      ┆ 345.50    ┆ 0.0       │
# │ x1        ┆ -5.0626   ┆ 0.54      ┆ -9.44     ┆ 0.000000  │
# │ x2        ┆ 0.0373    ┆ 0.57      ┆ 0.07      ┆ 0.948000  │
# │ x3        ┆ -4.5530   ┆ 2.39      ┆ -1.90     ┆ 0.062000  │
# │ x4        ┆ 0.0621    ┆ 0.01      ┆ 4.89      ┆ 0.000010  │
# │ x5        ┆ 0.7414    ┆ 2.45      ┆ 0.30      ┆ 0.763000  │
# └───────────┴───────────┴───────────┴───────────┴───────────┘

Unlike Lasso, Ridge provides valid standard errors and p-values. Temperature (x1) and square footage (x4) are significant; humidity (x2) and insulation (x5) are not.

Predictions

ridge_preds = (
    df.with_columns(
        ps.ridge_predict("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
                         lambda_=5.0, interval="prediction").alias("pred")
    )
    .unnest("pred")
)

print(ridge_preds.select("kwh", "ridge_prediction").head(5))
# ┌───────┬──────────────────┐
# │ kwh   ┆ ridge_prediction │
# ╞═══════╪══════════════════╡
# │ 245.0 ┆ 254.68           │
# │ 312.0 ┆ 318.57           │
# │ 198.0 ┆ 215.39           │
# │ 367.0 ┆ 364.80           │
# │ 289.0 ┆ 284.35           │
# └───────┴──────────────────┘

Formula Syntax

ridge_f = df.select(
    ps.ridge_formula("kwh ~ temp + humidity + occupancy + sqft + insulation",
                     lambda_=5.0).alias("model")
)

ridge_f_coefs = (
    df.select(
        ps.ridge_formula_summary("kwh ~ temp + humidity + occupancy + sqft + insulation",
                                 lambda_=5.0).alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

ridge_f_preds = (
    df.with_columns(
        ps.ridge_formula_predict("kwh ~ temp + humidity + occupancy + sqft + insulation",
                                 lambda_=5.0, interval="prediction").alias("pred")
    )
    .unnest("pred")
)

Elastic Net

Elastic Net combines L1 (Lasso) and L2 (Ridge) penalties. The alpha parameter controls the mix: alpha=1 is pure Lasso, alpha=0 is pure Ridge:

result = df.select(
    ps.elastic_net("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
                   lambda_=1.0, alpha=0.5).alias("model")
)

model = result["model"][0]
print(f"R²:           {model['r_squared']:.4f}")
print(f"RMSE:         {model['rmse']:.2f}")
print(f"Intercept:    {model['intercept']:.4f}")
print(f"Coefficients: {model['coefficients']}")

Expected output:

R²:           0.4649
RMSE:         59.77
Intercept:    458.7300
Coefficients: [-2.9278, 0.0, -2.9939, 0.0, 0.0]

Elastic Net zeroed out humidity, sqft, and insulation -- more aggressive sparsity than Lasso alone at this penalty level.

Coefficient Table

enet_coefs = (
    df.select(
        ps.elastic_net_summary("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
                               lambda_=1.0, alpha=0.5).alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

Predictions

enet_preds = (
    df.with_columns(
        ps.elastic_net_predict("kwh", "temp", "humidity", "occupancy", "sqft", "insulation",
                               lambda_=1.0, alpha=0.5, interval="prediction").alias("pred")
    )
    .unnest("pred")
)

Formula Syntax

enet_f = df.select(
    ps.elastic_net_formula("kwh ~ temp + humidity + occupancy + sqft + insulation",
                           lambda_=1.0, alpha=0.5).alias("model")
)

enet_f_coefs = (
    df.select(
        ps.elastic_net_formula_summary("kwh ~ temp + humidity + occupancy + sqft + insulation",
                                       lambda_=1.0, alpha=0.5).alias("coef")
    )
    .explode("coef")
    .unnest("coef")
)

enet_f_preds = (
    df.with_columns(
        ps.elastic_net_formula_predict("kwh ~ temp + humidity + occupancy + sqft + insulation",
                                       lambda_=1.0, alpha=0.5, interval="prediction").alias("pred")
    )
    .unnest("pred")
)

Comparing Regularization Methods

A side-by-side comparison of all four methods on the same two predictors (temp + sqft):

comparison = df.select(
    ps.ols("kwh", "temp", "sqft").alias("ols"),
    ps.ridge("kwh", "temp", "sqft", lambda_=1.0).alias("ridge"),
    ps.lasso("kwh", "temp", "sqft", lambda_=0.1).alias("lasso"),
    ps.elastic_net("kwh", "temp", "sqft", lambda_=1.0, alpha=0.5).alias("enet"),
)

for name in ["ols", "ridge", "lasso", "enet"]:
    m = comparison[name][0]
    print(f"{name:8s} R²={m['r_squared']:.4f}  RMSE={m['rmse']:.2f}  "
          f"coefs={m['coefficients']}")

Expected output:

ols      R²=0.9903  RMSE=8.04  coefs=[-7.5895, 0.0537]
ridge    R²=0.9903  RMSE=8.05  coefs=[-7.5597, 0.0536]
lasso    R²=0.9903  RMSE=8.06  coefs=[-7.5225, 0.0535]
enet     R²=0.9900  RMSE=8.16  coefs=[-7.3106, 0.0529]

With only two well-separated predictors, all four methods produce nearly identical fits. The differences become meaningful when predictors are correlated or when you need feature selection (Lasso/Elastic Net) or coefficient stability (Ridge).