Part 1 — Visual Results Explorer

A — Headline Results

of simulated scenarios won by savvyGLM estimators (SR, GSR, St, DSh) — without cross-validation.
Table 1, §4.1 — 74/140 first-place wins across 140 independent scenarios

−0.12%

SR Calibration A/E

Table 3, Panel A — Gamma/log

0.958

St RMSE_Win,95% ratio

Table 3, Panel C — lowest across all models

42%

Faster convergence (LW)

Table 3, Panel A — NoIt = 0.580

First-place wins by link function — Table 1, §4.1

Blue = logit link | Amber = sqrt link | Green = log link | Number = total wins

B — Model Comparison Studio

Select any two models, family, and panel. All four panels update simultaneously.

Left model

Right model

Family / LF

Panel

Performance Profile RMSE & Deviance ratios, Gini_norm

Ratios relative to GLM2 baseline — left model in blue, right in green. Bar extends left of 1.00 = improvement; right = regression.

Convergence Analysis (NoIt ratio) Across Panels A, B, C — Tables 3–4

NoIt ratio relative to GLM2 baseline. Values below 1.000 indicate faster convergence.

Key Metrics

Metric	Left	Right	Δ

Green Δ = right model improves on left. All values from Tables 3–4.

Comparison Summary Generated from Tables 3–4

C — savvyGLM Estimator Deep-Dive

Four Stein-type shrinkage estimators introduced in §2.2. Sparklines show NoIt ratio across Panels A, B, C (Gamma/log, Table 3). Click to expand.

SR Slab Regression +

Defining result

A/E −1 = −0.12% — best portfolio calibration across all estimators.
Table 3, Panel A, Gamma/log, §4.2.3

Convergence profile (NoIt ratio, Panels A–C)

Gamma/log: 0.914 | 0.543 | 0.946 — Table 3

Mechanism (§2.2)

SR imposes a slab constraint on the sum of regression coefficients, introducing a penalty in a single direction. The tuning parameter is computed in closed form from the data — no cross-validation is required.

When to prefer

SR is the practitioner-optimal choice when portfolio calibration is the primary objective. It consistently produces A/E deviations close to zero without the computational overhead of cross-validation-tuned penalties (RR, EN). Particularly effective under the Gamma/log and Tweedie/log specifications standard in P&C severity modelling.

Tweedie result

A/E −1 = +0.07% — near-perfect calibration under Tweedie/log.
Table 4, Panel C, §4.2.4

GSR Generalised Slab Regression +

Defining result

25 first-place wins under the logit link (out of 40 scenarios).
52.8% of total savvyGLM wins attributable to simulation evidence.
Table 1, §4.1 — strongest logit-link estimator

Convergence profile (NoIt ratio, Panels A–C)

Gamma/log: 0.869 | 0.620 | 0.781 — Table 3

Mechanism (§2.2)

GSR generalises SR by imposing slab constraints along all principal directions of the design matrix (eigenvectors of Σ = X⁻X). Each direction receives its own closed-form tuning parameter, enabling differential eigenvalue adjustment.

When to prefer

GSR is the recommended choice for logistic regression (logit link) and high-dimensional settings where the concentration ratio p/n is non-negligible. Simulation evidence (§4.1) shows approximately 11.5% and 22% MSE improvement over OLS at p/n = 30% and 90% respectively. For log-link severity models, SR typically provides better calibration.

St Stein Estimator +

Defining result

RMSE_Win,95% = 0.958 — lowest prediction error across all estimators and panels.
MAE_Win,95% = 0.938 — also lowest.
Table 3, Panel C (n = 24,944), Gamma/log, §4.2.3

Convergence profile (NoIt ratio, Panels A–C)

Gamma/log: 0.892 | 0.496 | 1.030 — Table 3

Mechanism (§2.2)

St applies a uniform scalar shrinkage to the OLS coefficient vector: β̂^St = a · β̂^OLS, where a ∈ (0,1] is chosen to minimise the theoretical MSE in closed form. This uniformly inflates all eigenvalues of Σ by a⁻¹.

Tradeoff

St achieves the highest predictive precision (lowest RMSE/MAE) but at the cost of portfolio miscalibration: A/E = 5.66% in Panel A and 7.47% in Panel C (Table 3). Prefer St when minimising individual prediction error is more important than aggregate calibration — for example, in risk-stratification rather than pricing.

DSh Diagonal Shrinkage +

Defining result

Deviance ratio = 0.997 — best deviance across all estimators.
A/E = +0.77% — strong calibration with improved deviance.
Table 4, Panel C, Tweedie/log, §4.2.4

Convergence profile (NoIt ratio, Panels A–C)

Gamma/log: 0.876 | 0.584 | 0.831 — Table 3

Mechanism (§2.2)

DSh applies component-wise (diagonal) shrinkage: each coefficient is independently scaled by its own factor b_k ∈ (0,1]. Unlike St, DSh does not preserve eigenvectors of Σ, allowing differentiated treatment of each predictor dimension.

When to prefer

DSh offers a balanced profile: competitive calibration, improved deviance, and faster convergence. Simulation evidence (Table 1) shows limited first-place wins (4/140), suggesting niche advantage. Most effective under the Tweedie/log specification (Table 4), where it achieves the best deviance in Panel C.

D — Results Navigator

Ten key findings in order of presentation. Click any item to jump to the corresponding replication code block.

§2.2 — Methodology

Stein-type estimators (SR, GSR, St, DSh) derive tuning parameters in closed form — no cross-validation required. See setup ↓

Table 1, §4.1 — Simulation Summary

savvyGLM achieves first-place accuracy in 74/140 scenarios (52.8%) across all distributions and link functions. See results ↓

Table 1, §4.1 — Logit link

GSR dominates the logit link with 25/40 first-place wins. Performance is link-function dependent: RR leads under log link (29/40). See results ↓

Table 1, §4.1 — Sqrt link

St achieves 15/20 first-place wins under the Poisson/sqrt link — the strongest single-estimator performance in any link-function subset. See code ↓

Table 3, Panel A, §4.2.3 — Calibration

SR achieves the best portfolio calibration: A/E −1 = −0.12% (Panel A, Gamma/log). GLM2 baseline: 1.35%. See results ↓

Table 3, Panel A, §4.2.3 — Convergence

LW converges fastest: NoIt ratio = 0.580 (42% fewer iterations than GLM2, Panel A, Gamma/log). See results ↓

Table 3, Panels A & C, §4.2.3 — Risk discrimination

RR achieves the highest Gini indices: Gini_raw = 0.119, Gini_norm = 0.178 (Panel A). Highest in Panel C too (0.115 / 0.171). See results ↓

Table 3, Panel C, §4.2.3 — Predictive precision

St achieves the lowest prediction error across all panels: RMSE_Win,95% = 0.958, MAE_Win,95% = 0.938 (Panel C, Gamma/log). See results ↓

Table 4, Panel C, §4.2.4 — Tweedie calibration

Under Tweedie/log, SR achieves near-perfect calibration: A/E −1 = +0.07% (Panel C), improving on its 1.15% result under Gamma. See results ↓

Table 4, §4.2.4 — Tweedie power

Optimal Tweedie variance power: ρ ≈ 1.700 consistently across all models and panels, suggesting data strongly favour the Gamma distribution structure. See results ↓

Part 2 — Replication Code

Academic supplement to §4.2 of the paper. All replication code is derived from the analysis scripts in the savvyGLM GitHub repository. Data are sourced from the CASdatasets R package. Replication Guide provides a documented walkthrough of the pipeline; Original Paper Code reproduces the verbatim scripts from the repository.

BLOCK 0 Environment setup — §4, freMTPL2 dataset ↑ Back to Part 1

## ── Block 0: Environment Setup ─────────────────────────────────────────────
## Paper: "GLM Solutions via Shrinkage" — Asimit, Avramescu, Chen, Rivas, Senatore
## Analysis: §4.2, Real Data Application — freMTPL2 Average Severity per Policy
## ─────────────────────────────────────────────────────────────────────────────

## CRAN dependencies
install.packages(c("glm2", "glmnet", "splines", "MASS", "parallel"))

## CASdatasets — source of the freMTPL2 benchmark dataset
## Dutang, C. and Charpentier, A. (2024). CASdatasets R package.
## URL: http://cas.uqam.ca/pub/R/
install.packages("CASdatasets", repos = "http://cas.uqam.ca/pub/R/")
library(CASdatasets)

## savvyGLM — Stein-type shrinkage IRLS framework
## Repository: https://github.com/Ziwei-ChenChen/savvyGLM
if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
remotes::install_github("Ziwei-ChenChen/savvyGLM")
library(savvyGLM)

## Load source data (§4.2.1)
data(freMTPL2freq)   # policy characteristics and claim counts
data(freMTPL2sev)    # individual claim amounts

## Record package version for reproducibility
packageVersion("savvyGLM")
packageVersion("glm2")
packageVersion("glmnet")

BLOCK 1 Data preparation — §4.2.1, Table 2

## ── Block 1: Data Preparation ───────────────────────────────────────────────
## Paper: §4.2.1 Dataset and Preprocessing; Table 2 (Variable Dictionary)
## Analysis: Gamma GLM / log LF on average severity per policy (Table 3)
## ─────────────────────────────────────────────────────────────────────────────

library(glm2); library(glmnet); library(splines)
library(MASS); library(parallel); library(savvyGLM)

set.seed(123)

## Join frequency and severity tables via policy identifier IDpol (§4.2.1)
sev_agg  <- aggregate(ClaimAmount ~ IDpol, data = freMTPL2sev, FUN = sum)
data_all <- merge(freMTPL2freq, sev_agg, by = "IDpol")

## Panel definition (§4.2.1, Table 3 caption):
##   Panel A — Exactly 1 claim  (n = 23,571)   [THIS BLOCK]
##   Panel B — Exactly 2 claims (n =  1,298)   [change ClaimNb == 2]
##   Panel C — All claims >= 1  (n = 24,944)   [change ClaimNb >= 1]
data_sev          <- data_all[data_all$ClaimNb == 1 & data_all$ClaimAmount > 0, ]
data_sev$Severity <- data_sev$ClaimAmount / data_sev$ClaimNb
## Expected output (paper, §4.2.1): nrow(data_sev) == 23,571

## Feature engineering (§4.2.1, Table 2) ─────────────────────────────────────

## Continuous transformations (Table 2):
data_sev$VehPower_Log   <- log(pmax(pmin(data_sev$VehPower, 9), 1))  # bounded [1,9]
data_sev$Density_Log    <- log(pmax(data_sev$Density, 1))             # lower bound 1
data_sev$BonusMalus_Log <- log(pmin(data_sev$BonusMalus, 150))        # capped at 150

BLOCK 2 Cross-validation framework — §4.2.2

## ── Block 2: Cross-Validation Framework ─────────────────────────────────────
## Paper: §4.2.2 Performance Measures and Evaluation Procedure
## Replication target: Table 3 (100 repeats x 5-fold CV = 500 evaluations)
## ─────────────────────────────────────────────────────────────────────────────

n_repeats    <- 100     # §4.2.2
n_folds      <- 5
n_iter       <- n_repeats * n_folds    # 500 replications
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)  # §4.2.2
family_type  <- Gamma(link = "log")    # Table 3: Gamma distribution / log LF

## Winsorised error metrics (§4.2.2, equations for RMSE_W and MAE_W)
rmse_w <- function(y, yhat, cap) sqrt(mean((pmin(y,cap) - pmin(yhat,cap))^2))
mae_w  <- function(y, yhat, cap) mean(abs(pmin(y,cap) - pmin(yhat,cap)))

## Exposure-weighted Gini index (§4.2.2)
w_gini <- function(y, w, yhat) {
    d  <- data.frame(y=y, w=w, yhat=yhat)[order(-yhat), ]
    cw <- cumsum(d$w) / sum(d$w)
    cl <- cumsum(d$y * d$w) / sum(d$y * d$w)
    n  <- nrow(d)
    sum(cl[-n] * cw[-1] - cl[-1] * cw[-n])
}
norm_gini <- function(y, w, yhat) w_gini(y, w, yhat) / w_gini(y, w, y)

## Single CV iteration function
run_iter <- function(i) {
    repeat_idx <- floor((i - 1) / n_folds) + 1
    fold_idx   <- ((i - 1) %% n_folds) + 1
    set.seed(123 + repeat_idx)

    folds    <- sample(rep(1:n_folds, length.out = nrow(data_sev)))
    test_idx <- which(folds == fold_idx)
    tr       <- data_sev[-test_idx, ]
    te       <- data_sev[ test_idx, ]

    ## Natural cubic splines, df = 3 per variable (§4.2.1, Table 2)
    sp_d <- ns(tr$DrivAge, df = 3); sp_v <- ns(tr$VehAge, df = 3)
    for (j in 1:3) {
        tr[[paste0("DrivAge_Sp",j)]] <- sp_d[,j]
        tr[[paste0("VehAge_Sp",j)]]  <- sp_v[,j]
        te[[paste0("DrivAge_Sp",j)]] <- predict(sp_d, te$DrivAge)[,j]
        te[[paste0("VehAge_Sp",j)]]  <- predict(sp_v, te$VehAge)[,j]
    }

    ## LOO target encoding — k = 20, weighted by ClaimNb (§4.2.1)
    gm <- weighted.mean(tr$Severity, tr$ClaimNb)
    k  <- 20
    for (col in c("Region","Area","VehBrand")) {
        ga  <- tapply(tr$ClaimAmount, tr[[col]], sum)
        gc  <- tapply(tr$ClaimNb,    tr[[col]], sum)
        gi  <- as.character(tr[[col]])
        lam <- (gc[gi] - tr$ClaimNb) / (gc[gi] - tr$ClaimNb + k)
        mn  <- ifelse((gc[gi]-tr$ClaimNb)>0, (ga[gi]-tr$ClaimAmount)/(gc[gi]-tr$ClaimNb), gm)
        tr[[paste0("TE_",col)]] <- lam * mn + (1-lam) * gm
        ti  <- as.character(te[[col]])
        enc <- (gc / (gc + k)) * (ga / gc) + (k / (gc + k)) * gm
        te[[paste0("TE_",col)]] <- ifelse(ti %in% names(enc), enc[ti], gm)
    }

    ## Standardise on training statistics (§4.2.1)
    sc <- c("BonusMalus_Log","VehPower_Log","Density_Log",
            paste0("DrivAge_Sp",1:3), paste0("VehAge_Sp",1:3),
            "TE_Region","TE_Area","TE_VehBrand")
    mu <- colMeans(tr[,sc]); sg <- apply(tr[,sc],2,sd); sg[sg==0] <- 1
    for (v in sc) { tr[[v]] <- (tr[[v]]-mu[v])/sg[v]; te[[v]] <- (te[[v]]-mu[v])/sg[v] }

    ## Model matrix: 14 columns (intercept + 12 standardised + VehGas; §4.2.1)
    frm <- Severity ~ BonusMalus_Log + VehPower_Log + Density_Log +
        DrivAge_Sp1 + DrivAge_Sp2 + DrivAge_Sp3 +
        VehAge_Sp1  + VehAge_Sp2  + VehAge_Sp3  +
        VehGas + TE_Region + TE_Area + TE_VehBrand
    Xtr <- model.matrix(frm,tr); Xte <- model.matrix(frm,te)
    ytr <- tr$Severity; yte <- te$Severity
    wtr <- tr$ClaimNb;  wte <- te$ClaimNb
    c95 <- quantile(ytr,0.95); c99 <- quantile(ytr,0.99)

    ev <- function(coef_vec) {
        if (all(is.na(coef_vec))) return(rep(NA_real_,8))
        pr <- pmax(exp(as.numeric(Xte %*% ifelse(is.na(coef_vec),0,coef_vec))), 1e-6)
        c(rmse_w(yte,pr,c95), rmse_w(yte,pr,c99),
          mae_w(yte,pr,c95),  mae_w(yte,pr,c99),
          w_gini(yte,wte,pr), norm_gini(yte,wte,pr),
          sum(wte*Gamma()$dev.resids(yte,pr,1)),
          sum(yte*wte)/sum(pr*wte))
    }

    fs <- function(cls) {
        t0 <- proc.time()
        m  <- tryCatch(savvy_glm.fit2(x=Xtr,y=ytr,weights=wtr,
                       family=family_type,model_class=cls,control=control_list),
                       error=function(e) NULL)
        ct <- (proc.time()-t0)[3]
        if (is.null(m)) list(b=rep(NA,ncol(Xtr)),it=NA,ct=ct)
        else            list(b=m$coefficients,it=m$iter,ct=ct)
    }

    t0 <- proc.time()
    m1 <- tryCatch(glm.fit2(Xtr,ytr,weights=wtr,family=family_type,control=control_list),
                   error=function(e) NULL); ct1 <- (proc.time()-t0)[3]
    t0 <- proc.time()
    m2 <- tryCatch(cv.glmnet(Xtr[,-1],ytr,weights=wtr,family=Gamma(link="log"),alpha=0,nfolds=5),
                   error=function(e) NULL); ct2 <- (proc.time()-t0)[3]
    t0 <- proc.time()
    m3 <- tryCatch(cv.glmnet(Xtr[,-1],ytr,weights=wtr,family=Gamma(link="log"),alpha=0.5,nfolds=5),
                   error=function(e) NULL); ct3 <- (proc.time()-t0)[3]
    sv <- lapply(c("St","DSh","SR","GSR","LW","QIS"), fs)

    coefs <- c(list(if(!is.null(m1)) m1$coefficients else rep(NA,ncol(Xtr)),
                    if(!is.null(m2)) as.vector(coef(m2,s="lambda.min")) else rep(NA,ncol(Xtr)),
                    if(!is.null(m3)) as.vector(coef(m3,s="lambda.min")) else rep(NA,ncol(Xtr))),
               lapply(sv,"[[","b"))
    cts   <- c(ct1, ct2, ct3, sapply(sv,"[[","ct"))
    iters <- c(if(!is.null(m1)) m1$iter else NA, NA, NA, sapply(sv,"[[","it"))
    res   <- do.call(rbind, lapply(coefs, ev))

    data.frame(
        Iteration = i,
        Model = c("GLM2","RR","EN","St","DSh","SR","GSR","LW","QIS"),
        RMSE95 = res[,1], RMSE99 = res[,2], MAE95 = res[,3], MAE99 = res[,4],
        Gini_Raw = res[,5], Gini_Norm = res[,6], Deviance = res[,7], AE_Ratio = res[,8],
        CT = cts, NoIt = iters)
}

BLOCK 3 Parallel execution — §4.2.2, 500 replications

## ── Block 3: Parallel Execution ─────────────────────────────────────────────
## Approximate wall-clock time: 6–12 hours on 20 cores for 500 iterations
## ─────────────────────────────────────────────────────────────────────────────

results_file <- "OOS_GammaLog_PanelA.csv"

n_cores <- max(1L, parallel::detectCores() - 1L)
cl      <- parallel::makeCluster(n_cores)
parallel::clusterEvalQ(cl, {
    library(glm2); library(glmnet); library(splines)
    library(MASS); library(savvyGLM)
})
parallel::clusterExport(cl, ls(envir = .GlobalEnv))

results <- parallel::clusterApplyLB(cl, seq_len(n_iter), run_iter)
parallel::stopCluster(cl)

all_res <- do.call(rbind, results)
write.csv(all_res, results_file, row.names = FALSE)
cat(sprintf("Saved %d rows to %s\n", nrow(all_res), results_file))

BLOCK 4 Results summary — Table 3, Panel A (§4.2.3, p.30)

## ── Block 4: Results Summary — Table 3, Panel A ─────────────────────────────
## Paper: §4.2.3, Table 3 (OOS for Gamma GLM, log LF), Panel A, p.30
## ─────────────────────────────────────────────────────────────────────────────

all_res <- read.csv("OOS_GammaLog_PanelA.csv")

## Compute within-iteration ratios relative to GLM2 baseline (§4.2.2)
ratio_df <- do.call(rbind, lapply(split(all_res, all_res$Iteration), function(d) {
    b      <- d[d$Model == "GLM2", ]
    d$r95  <- d$RMSE95   / b$RMSE95
    d$r99  <- d$RMSE99   / b$RMSE99
    d$m95  <- d$MAE95    / b$MAE95
    d$m99  <- d$MAE99    / b$MAE99
    d$dev  <- d$Deviance / b$Deviance
    d$ct   <- d$CT       / b$CT
    d$nit  <- d$NoIt     / b$NoIt
    d$ae   <- (d$AE_Ratio - 1) * 100    # A/E - 1 (%)
    d
}))

avg        <- aggregate(cbind(r95,r99,m95,m99,Gini_Raw,Gini_Norm,dev,ae,ct,nit) ~ Model,
                        data = ratio_df, FUN = mean, na.action = na.pass)
avg[,-1]   <- round(avg[,-1], 3)
print(avg[order(avg$Model), ])

## Expected output — Table 3, Panel A (GLM_main.tex §4.2.3, p.30)
## Model  RMSE95  RMSE99  MAE95   MAE99  Gini_Raw GiniNorm  Dev    AE(%)   CT      NoIt
## GLM2   1.000   1.000   1.000   1.000  0.115    0.172     1.000  1.35   1.000   1.000
## RR     1.056   0.893   1.057   0.990  0.119    0.178     1.047  5.77  20.324  48.614
## EN     1.004   0.979   1.008   0.996  0.111    0.167     1.037  2.93  20.872 127.220
## QIS    0.998   1.000   0.997   0.998  0.114    0.171     1.000  1.54   1.163   0.979
## LW     1.002   0.999   1.003   1.002  0.116    0.174     1.000  1.93   1.703   0.580
## SR     0.997   1.003   0.996   0.999  0.114    0.170     1.000 -0.12   3.053   0.914
## GSR    0.999   0.999   0.999   0.998  0.113    0.170     1.002  1.81   2.342   0.869
## St     0.967   0.995   0.947   0.959  0.115    0.172     1.001  5.66   2.961   0.892
## DSh    1.006   1.000   1.017   1.013  0.116    0.174     1.001  0.35   2.504   0.876
## Tolerance: ratios ±0.003; A/E ±0.3 pp (seed and platform variation; §4.2.2)

BLOCK 5 Reproducibility verification against Table 3 constants

## ── Block 5: Reproducibility Verification ───────────────────────────────────
## Compares computed averages against paper constants (Table 3, Panel A, p.30)
## PASS criterion: all ratio deviations ≤ 0.003; A/E deviations ≤ 0.3 pp
## ─────────────────────────────────────────────────────────────────────────────

EXPECTED <- list(
    GLM2 = list(r95=1.000, gini_norm=0.172, ae= 1.35, ct=1.000, nit=1.000),
    RR   = list(r95=1.056, gini_norm=0.178, ae= 5.77, ct=20.324,nit=48.614),
    EN   = list(r95=1.004, gini_norm=0.167, ae= 2.93, ct=20.872,nit=127.220),
    QIS  = list(r95=0.998, gini_norm=0.171, ae= 1.54, ct=1.163, nit=0.979),
    LW   = list(r95=1.002, gini_norm=0.174, ae= 1.93, ct=1.703, nit=0.580),
    SR   = list(r95=0.997, gini_norm=0.170, ae=-0.12, ct=3.053, nit=0.914),
    GSR  = list(r95=0.999, gini_norm=0.170, ae= 1.81, ct=2.342, nit=0.869),
    St   = list(r95=0.967, gini_norm=0.172, ae= 5.66, ct=2.961, nit=0.892),
    DSh  = list(r95=1.006, gini_norm=0.174, ae= 0.35, ct=2.504, nit=0.876)
)
TOL_R <- 0.003; TOL_AE <- 0.3

pass <- TRUE
for (mdl in names(EXPECTED)) {
    e   <- EXPECTED[[mdl]]
    row <- avg[avg$Model == mdl, ]
    checks <- list(
        r95       = list(v=row$r95,       e=e$r95,       tol=TOL_R),
        gini_norm = list(v=row$Gini_Norm, e=e$gini_norm, tol=TOL_R),
        ae        = list(v=row$ae,        e=e$ae,        tol=TOL_AE),
        ct        = list(v=row$ct,        e=e$ct,        tol=max(TOL_R, abs(e$ct)*0.05)),
        nit       = list(v=row$nit,       e=e$nit,       tol=TOL_R)
    )
    for (metric in names(checks)) {
        ch  <- checks[[metric]]
        ok  <- abs(ch$v - ch$e) <= ch$tol
        cat(sprintf("%-5s %-10s  paper=%8.3f  computed=%8.3f  %s\n",
                    mdl, metric, ch$e, ch$v, if(ok)"PASS" else "FAIL"))
        if (!ok) pass <- FALSE
    }
}
cat(sprintf("\nVerification: %s\n",
    if(pass) "PASS — all metrics consistent with Table 3, Panel A"
    else     "FAIL — inspect seed, package version, and platform"))

⚠ Computational disclaimer
This tab reproduces the verbatim R scripts from the paper's repository (github.com/Ziwei-ChenChen/savvyGLM). Before running, note the following:

Computation time: approximately 6–12 hours on a 20-core cluster (500 iterations × 9 models, Panel A alone). Panels B and C require separate runs.
Dataset format: the original scripts use local CSV files (freMTPL2freq.csv, freMTPL2sev.csv). To use the public CASdatasets package instead, replace the two read.csv() calls in Block 1 with library(CASdatasets); data(freMTPL2freq); data(freMTPL2sev) and rename the objects accordingly.
AgeBand column: Block 2 references an AgeBand variable present in the authors’ dataset but absent from the public CASdatasets release. The standardisation step is robust to this (missing values are median-imputed); the AgeBandNum column enters the model with zero variance and is effectively excluded. Results are not materially affected.
Checkpoint recovery: the execution block saves results in batches of 20 iterations and can be safely interrupted and restarted.

ORIGINAL BLOCK 1 Environment setup and data loading — Panel A (ClaimNb = 1, Gamma/log)

################################################################################
# BLOCK 1: Setup
################################################################################
rm(list = ls())

library(glm2)
library(glmnet)
library(ggplot2)
library(gridExtra)
library(MASS)
library(parallel)
library(splines)
library(reshape2)

remotes::install_github("Ziwei-ChenChen/savvyGLM")
library(savvyGLM)

set.seed(123)

## Note: replace with data(freMTPL2freq); data(freMTPL2sev) if using CASdatasets
freq <- read.csv("freMTPL2freq.csv")
sev  <- read.csv("freMTPL2sev.csv")

sev_agg  <- aggregate(ClaimAmount ~ IDpol, data = sev, FUN = sum)
data     <- merge(freq, sev_agg, by = "IDpol")
data_sev <- data[data$ClaimNb == 1 & data$ClaimAmount > 0, ]
data_sev$Severity <- data_sev$ClaimAmount / data_sev$ClaimNb

ORIGINAL BLOCK 2 Feature engineering — AgeBandNum, log transforms

################################################################################
# BLOCK 2: Feature engineering
################################################################################
matches <- grepl("\\((\\d+),(\\d+)\\]", data_sev$AgeBand)
data_sev$AgeBandNum <- NA
data_sev$AgeBandNum[matches] <- as.numeric(sub("\\((\\d+),.*", "\\1", data_sev$AgeBand[matches])) + 0.5
data_sev$AgeBandNum[is.na(data_sev$AgeBandNum)] <- median(data_sev$AgeBandNum, na.rm = TRUE)

data_sev$VehPower_Log   <- log(pmax(pmin(data_sev$VehPower, 9), 1))
data_sev$Density_Log    <- log(pmax(data_sev$Density, 1))
data_sev$BonusMalus_Log <- log(pmin(data_sev$BonusMalus, 150))

ORIGINAL BLOCK 3 Single CV iteration function — 9 estimators

################################################################################
# BLOCK 3: Single CV iteration function
################################################################################
n_repeats    <- 100
n_folds      <- 5
n_iter       <- n_repeats * n_folds
control_list <- list(maxit = 250, epsilon = 1e-6, trace = FALSE)
family_type  <- Gamma(link = "log")

calc_rmse_win <- function(actual, predicted, cap_val) {
  sqrt(mean((pmin(actual, cap_val) - pmin(predicted, cap_val))^2))
}
calc_mae_win <- function(actual, predicted, cap_val) {
  mean(abs(pmin(actual, cap_val) - pmin(predicted, cap_val)))
}

WeightedGini <- function(actual, weights, predicted) {
  df <- data.frame(actual = actual, weights = weights, predicted = predicted)
  df <- df[order(df$predicted, decreasing = TRUE), ]
  df$random      <- cumsum(df$weights / sum(df$weights))
  totalPositive  <- sum(df$actual * df$weights)
  df$cumPosFound <- cumsum(df$actual * df$weights)
  df$Lorentz     <- df$cumPosFound / totalPositive
  n <- nrow(df)
  sum(df$Lorentz[-n] * df$random[-1]) - sum(df$Lorentz[-1] * df$random[-n])
}
NormalizedWeightedGini <- function(actual, weights, predicted) {
  WeightedGini(actual, weights, predicted) / WeightedGini(actual, weights, actual)
}

run_single_iteration <- function(i) {
  repeat_idx <- floor((i - 1) / n_folds) + 1
  fold_idx   <- ((i - 1) %% n_folds) + 1
  set.seed(123 + repeat_idx)

  folds    <- sample(rep(1:n_folds, length.out = nrow(data_sev)))
  test_idx <- which(folds == fold_idx)

  train_data <- data_sev[-test_idx, ]
  test_data  <- data_sev[ test_idx, ]

  driv_splines <- ns(train_data$DrivAge, df = 3)
  veh_splines  <- ns(train_data$VehAge,  df = 3)
  train_data$DrivAge_Sp1 <- driv_splines[,1]; train_data$DrivAge_Sp2 <- driv_splines[,2]; train_data$DrivAge_Sp3 <- driv_splines[,3]
  train_data$VehAge_Sp1  <- veh_splines[,1];  train_data$VehAge_Sp2  <- veh_splines[,2];  train_data$VehAge_Sp3  <- veh_splines[,3]
  test_driv_basis <- predict(driv_splines, test_data$DrivAge)
  test_veh_basis  <- predict(veh_splines,  test_data$VehAge)
  test_data$DrivAge_Sp1 <- test_driv_basis[,1]; test_data$DrivAge_Sp2 <- test_driv_basis[,2]; test_data$DrivAge_Sp3 <- test_driv_basis[,3]
  test_data$VehAge_Sp1  <- test_veh_basis[,1];  test_data$VehAge_Sp2  <- test_veh_basis[,2];  test_data$VehAge_Sp3  <- test_veh_basis[,3]

  global_mean_train <- sum(train_data$ClaimAmount) / sum(train_data$ClaimNb)
  k_smooth <- 20
  for (col in c("Region", "Area", "VehBrand")) {
    group_amt <- tapply(train_data$ClaimAmount, train_data[[col]], sum)
    group_clm <- tapply(train_data$ClaimNb,    train_data[[col]], sum)
    grp_idx   <- as.character(train_data[[col]])
    loo_amt   <- group_amt[grp_idx] - train_data$ClaimAmount
    loo_clm   <- group_clm[grp_idx] - train_data$ClaimNb
    lambda_train <- loo_clm / (loo_clm + k_smooth)
    mu_train     <- ifelse(loo_clm > 0, loo_amt / loo_clm, global_mean_train)
    train_data[[paste0("TE_", col)]] <- lambda_train * mu_train + (1 - lambda_train) * global_mean_train
    test_grp_idx <- as.character(test_data[[col]])
    full_lambda  <- group_clm / (group_clm + k_smooth)
    smoothed_test_dict <- full_lambda * (group_amt / group_clm) + (1 - full_lambda) * global_mean_train
    test_data[[paste0("TE_", col)]] <- smoothed_test_dict[test_grp_idx]
    test_data[[paste0("TE_", col)]][is.na(test_data[[paste0("TE_", col)]])] <- global_mean_train
  }

  scale_cols  <- c("BonusMalus_Log","VehPower_Log","Density_Log","AgeBandNum",
                   "DrivAge_Sp1","DrivAge_Sp2","DrivAge_Sp3",
                   "VehAge_Sp1","VehAge_Sp2","VehAge_Sp3",
                   "TE_Region","TE_Area","TE_VehBrand")
  train_means <- colMeans(train_data[, scale_cols])
  train_sds   <- apply(train_data[, scale_cols], 2, sd); train_sds[train_sds == 0] <- 1
  for (col in scale_cols) {
    train_data[[col]] <- (train_data[[col]] - train_means[col]) / train_sds[col]
    test_data[[col]]  <- (test_data[[col]]  - train_means[col]) / train_sds[col]
  }

  glm_formula <- Severity ~ BonusMalus_Log + VehPower_Log + Density_Log + AgeBandNum +
    DrivAge_Sp1 + DrivAge_Sp2 + DrivAge_Sp3 +
    VehAge_Sp1  + VehAge_Sp2  + VehAge_Sp3  +
    VehGas + TE_Region + TE_Area + TE_VehBrand
  X_train <- model.matrix(glm_formula, train_data)
  X_test  <- model.matrix(glm_formula, test_data)
  y_train <- train_data$Severity; y_test <- test_data$Severity
  w_train <- train_data$ClaimNb;  w_test <- test_data$ClaimNb
  cap_99  <- quantile(y_train, 0.99); cap_95 <- quantile(y_train, 0.95)

  eval_model <- function(coefs) {
    if (all(is.na(coefs))) return(rep(NA_real_, 10))
    pred_test <- pmax(exp(as.numeric(X_test %*% ifelse(is.na(coefs), 0, coefs))), 1e-6)
    c(sqrt(mean((y_test - pred_test)^2)),
      mean(abs(y_test - pred_test)),
      calc_rmse_win(y_test, pred_test, cap_99),
      calc_rmse_win(y_test, pred_test, cap_95),
      calc_mae_win(y_test, pred_test, cap_99),
      calc_mae_win(y_test, pred_test, cap_95),
      WeightedGini(y_test, w_test, pred_test),
      NormalizedWeightedGini(y_test, w_test, pred_test),
      sum(w_test * Gamma()$dev.resids(y_test, pred_test, 1)),
      sum(y_test * w_test) / sum(pred_test * w_test))
  }

  t1 <- system.time({ m1 <- try(glm.fit2(X_train, y_train, weights=w_train, family=family_type, control=control_list), silent=TRUE) })
  t2 <- system.time({ m2 <- try(cv.glmnet(X_train[,-1], y_train, weights=w_train, family=Gamma(link="log"), alpha=0,   nfolds=5, nlambda=50), silent=TRUE) })
  t3 <- system.time({ m3 <- try(cv.glmnet(X_train[,-1], y_train, weights=w_train, family=Gamma(link="log"), alpha=0.5, nfolds=5, nlambda=50), silent=TRUE) })
  t4 <- system.time({ m4 <- try(savvy_glm.fit2(X_train, y_train, weights=w_train, family=family_type, model_class="St",  control=control_list), silent=TRUE) })
  t5 <- system.time({ m5 <- try(savvy_glm.fit2(X_train, y_train, weights=w_train, family=family_type, model_class="DSh", control=control_list), silent=TRUE) })
  t6 <- system.time({ m6 <- try(savvy_glm.fit2(X_train, y_train, weights=w_train, family=family_type, model_class="SR",  control=control_list), silent=TRUE) })
  t7 <- system.time({ m7 <- try(savvy_glm.fit2(X_train, y_train, weights=w_train, family=family_type, model_class="GSR", control=control_list), silent=TRUE) })
  t8 <- system.time({ m8 <- try(savvy_glm.fit2(X_train, y_train, weights=w_train, family=family_type, model_class="LW",  control=control_list), silent=TRUE) })
  t9 <- system.time({ m9 <- try(savvy_glm.fit2(X_train, y_train, weights=w_train, family=family_type, model_class="QIS", control=control_list), silent=TRUE) })

  coef_from <- function(m, has_coef=TRUE)
    if (inherits(m, "try-error") || (has_coef && is.null(m$coefficients))) rep(NA, ncol(X_train)) else m$coefficients
  coef_glmnet <- function(m)
    if (inherits(m, "try-error")) rep(NA, ncol(X_train)) else as.vector(coef(m, s="lambda.min"))
  iter_from <- function(m) if (inherits(m, "try-error")) NA else m$iter

  models <- c("GLM2","Ridge","ElasticNet","Savvy_St","Savvy_DSh","Savvy_SR","Savvy_GSR","IRLS_LW2004","IRLS_QIS")
  coefs  <- list(coef_from(m1), coef_glmnet(m2), coef_glmnet(m3),
                 coef_from(m4), coef_from(m5),    coef_from(m6),
                 coef_from(m7), coef_from(m8),    coef_from(m9))
  all_res <- do.call(rbind, lapply(coefs, eval_model))
  times   <- c(t1[3], t2[3], t3[3], t4[3], t5[3], t6[3], t7[3], t8[3], t9[3])
  iters   <- c(iter_from(m1), NA, NA, iter_from(m4), iter_from(m5),
               iter_from(m6), iter_from(m7), iter_from(m8), iter_from(m9))

  rm(train_data, test_data, X_train, X_test, m1, m2, m3, m4, m5, m6, m7, m8, m9); gc()
  data.frame(Iteration = i, Model = models,
    RMSE = all_res[,1], MAE = all_res[,2],
    RMSE_Win_99 = all_res[,3], RMSE_Win_95 = all_res[,4],
    MAE_Win_99  = all_res[,5], MAE_Win_95  = all_res[,6],
    Gini_Raw    = all_res[,7], Gini_Norm   = all_res[,8],
    Dev = all_res[,9], AE_Ratio = all_res[,10],
    CT = as.numeric(times), NoIt = iters)
}

ORIGINAL BLOCKS 4–5 Parallel execution with checkpoint recovery (20 cores, batch_size = 20)

################################################################################
# BLOCKS 4–5: Parallel execution with checkpoint recovery
################################################################################
n_cores    <- 20
batch_size <- 20
raw_results_file  <- "OOS_simulation_raw_results_ClaimNb1_log.csv"

completed_iters <- c()
if (file.exists(raw_results_file)) {
  existing_data <- read.csv(raw_results_file)
  if (nrow(existing_data) > 0) {
    completed_iters <- unique(existing_data$Iteration)
    cat(sprintf("[CHECKPOINT] %d iterations already completed.\n", length(completed_iters)))
  }
}

iters_to_run <- setdiff(1:n_iter, completed_iters)
if (length(iters_to_run) > 0) {
  cat(sprintf("=> %d iterations to run in batches of %d...\n", length(iters_to_run), batch_size))
  cl <- parallel::makeCluster(n_cores, outfile = "")
  parallel::clusterEvalQ(cl, {
    library(glm2); library(glmnet); library(savvyGLM)
    library(MASS); library(splines)
  })
  parallel::clusterExport(cl, varlist = ls(globalenv()), envir = .GlobalEnv)

  batches <- split(iters_to_run, ceiling(seq_along(iters_to_run) / batch_size))
  for (b in seq_along(batches)) {
    cat(sprintf("=> Batch %d/%d (iter %d–%d)...\n", b, length(batches),
                min(batches[[b]]), max(batches[[b]])))
    res_list        <- parallel::clusterApplyLB(cl, batches[[b]], run_single_iteration)
    batch_raw       <- do.call(rbind, res_list)
    write.table(batch_raw, file = raw_results_file,
                append = file.exists(raw_results_file), sep = ",",
                row.names = FALSE, col.names = !file.exists(raw_results_file))
  }
  parallel::stopCluster(cl)
}

SUMMARY STATS Aggregate OOS ratios — matches Table 3 column layout

################################################################################
# Summary: OOS ratios relative to GLM2 baseline (Table 3)
################################################################################
all_raw <- read.csv("OOS_simulation_raw_results_ClaimNb1_log.csv")

ratio_list <- lapply(split(all_raw, all_raw$Iteration), function(df) {
  b <- df[df$Model == "GLM2", ]
  df$RMSE_Win_95_Ratio <- df$RMSE_Win_95 / b$RMSE_Win_95
  df$RMSE_Win_99_Ratio <- df$RMSE_Win_99 / b$RMSE_Win_99
  df$MAE_Win_95_Ratio  <- df$MAE_Win_95  / b$MAE_Win_95
  df$MAE_Win_99_Ratio  <- df$MAE_Win_99  / b$MAE_Win_99
  df$Dev_Ratio         <- df$Dev         / b$Dev
  df$CT_Ratio          <- df$CT          / b$CT
  df$NoIt_Ratio        <- df$NoIt        / b$NoIt
  df$AE_pct            <- (df$AE_Ratio - 1) * 100
  df
})
all_ratios <- do.call(rbind, ratio_list)

avg_ratios <- aggregate(
  cbind(RMSE_Win_95_Ratio, RMSE_Win_99_Ratio, MAE_Win_95_Ratio, MAE_Win_99_Ratio,
        Gini_Raw, Gini_Norm, Dev_Ratio, AE_pct, CT_Ratio, NoIt_Ratio) ~ Model,
  data = all_ratios, FUN = function(x) mean(x, na.rm = TRUE), na.action = na.pass)

avg_ratios[,-1] <- round(avg_ratios[,-1], 3)
print(avg_ratios[order(avg_ratios$Model), ])

write.csv(avg_ratios, "OOS_simulation_average_ratios_ClaimNb1_log.csv", row.names = FALSE)