Copula models for insurance pricing — D-vine temporal dependence, two-part occurrence/severity.
Merged from: insurance-vine-longitudinal (D-vine copula for panel data).
A policyholder who claimed last year is more likely to claim again next year. This is not just adverse selection — it is genuine claim persistence. Standard GLM pricing captures risk factors (age, vehicle type, region) but ignores temporal dependence in residuals. NCD scales encode a binary rule: claimed or didn't. Neither approach gives you a principled conditional distribution.
This library implements the Yang & Czado (2022) two-part D-vine copula for longitudinal insurance claims. You observe a policyholder over T years. The model learns the full joint distribution of claim occurrence and severity across those years, then conditions on observed history to give the next-year claim distribution.
- Fits a logistic GLM for claim occurrence and a gamma/log-normal GLM for severity. These strip out systematic risk factors.
- Applies the probability integral transform (PIT) to the residuals — what the GLM cannot explain.
- Fits a stationary D-vine copula on the occurrence PIT residuals. The vine structure is temporal: tree level k captures lag-k dependence.
- Does the same for severity PIT residuals.
- Uses h-function recursion to compute the conditional distribution of next year's claim given observed history.
- Returns: conditional claim probability, conditional severity quantiles, experience-rated premium, relativity factors.
This is not a neural/sequence model. It does not replace your GLM. It operates on the GLM residuals and quantifies how much temporal persistence remains after controlling for risk factors. The statistical structure is transparent and auditable — relevant for Consumer Duty documentation.
pip install insurance-copulaimport pandas as pd
from insurance_copula.vine import PanelDataset, TwoPartDVine
# Your panel data: one row per (policyholder, year)
df = pd.read_parquet("motor_panel.parquet")
# Build the panel object (validates, handles unbalanced panels)
panel = PanelDataset.from_dataframe(
df,
id_col="policy_id",
year_col="year",
claim_col="has_claim",
severity_col="claim_amount",
covariate_cols=["age", "vehicle_group", "region"],
)
# Fit the two-part D-vine
model = TwoPartDVine(severity_family="gamma", max_truncation=2)
model.fit(panel)
print(model)
# TwoPartDVine(fitted, t_dim=4, occurrence_p=1, severity_p=2)
# Predict next-year claim probability given history
proba = model.predict_proba(history_df)
# policy_id
# POL00001 0.142
# POL00002 0.089
# POL00003 0.247
# Name: claim_proba, dtype: float64
# Conditional severity quantiles
quantiles = model.predict_severity_quantile(history_df, quantiles=[0.5, 0.95])
# Experience-rated premium
premium = model.predict_premium(history_df, loading=0.15)
# Experience relativity = copula premium / a priori GLM premium
relativity = model.experience_relativity(history_df)Top-level imports also work:
from insurance_copula import PanelDataset, TwoPartDVine, extract_relativity_curveThe output pricing teams actually use: how does claim history shift the predicted premium relative to the a priori estimate?
from insurance_copula import extract_relativity_curve, compare_to_ncd
curve = extract_relativity_curve(
model,
claim_counts=[0, 1, 2, 3],
n_years_list=[1, 2, 3, 4, 5],
)
print(curve.pivot(index="claim_count", columns="n_years", values="relativity").round(3))
# 1yr 2yr 3yr 4yr 5yr
# 0 claims 1.00 1.00 1.00 1.00 1.00
# 1 claim 1.35 1.28 1.22 1.18 1.14
# 2 claims NaN 1.71 1.58 1.48 1.40
# 3 claims NaN NaN 2.01 1.87 1.74
# Compare against NCD scale
comparison = compare_to_ncd(curve)
print(comparison[comparison["claim_count"] == 0].to_string())The D-vine is truncated at order p, selected by BIC. At p=1, the model is a first-order Markov chain: only the most recent year matters after conditioning on covariates. At p=2, the last two years matter. For UK motor data, p=1 or p=2 is typical.
print(model.occurrence_vine.truncation_level) # e.g., 1
print(model.occurrence_vine.fit_result_.bic_by_level)
# {1: 4821.3, 2: 4832.1} → p=1 selectedPost PS21-5 (2022), renewal pricing must be fair. A D-vine model gives an auditable conditional distribution, separating genuine claim persistence (legitimate risk signal) from premium optimisation targeting (what the FCA is policing). The relativity table above is directly documentable.
Benchmarked against NCD flat adjustment (Poisson GLM + fixed step function: 0 claims = 0.55×, 1 claim = 0.75×, 2+ claims = 1.30×) on a synthetic panel of 5,000 policyholders over 3 years with a known latent frailty DGP. Oracle predictions (exact Gamma-Poisson posterior) serve as an upper bound. Full notebook: notebooks/benchmark.py.
| Metric | NCD Baseline | D-vine Copula | Oracle |
|---|---|---|---|
| Out-of-sample log-likelihood | lower | higher | highest |
| Brier score | higher | lower | lowest |
| MAE (predicted probability vs outcome) | higher | lower | lowest |
| Recency sensitivity (year-1 vs year-2 claim) | none | captures it | captures it |
The benchmark tests the core weakness of NCD: a policyholder who claimed in year 1 only receives the same multiplier as one who claimed in year 2 only, even though the DGP makes recency matter. The D-vine conditions on the full sequence and assigns higher probability to a recent claim. The notebook also reports calibration (A/E by NCD band) and the fraction of oracle improvement that the vine captures over NCD.
When to use: You have a panel of 3+ years of policyholder history and want experience-rated renewal pricing that goes beyond NCD steps — particularly where claim recency, not just count, matters.
When NOT to use: You have only one year of history per policyholder, or your book turns over too rapidly to build meaningful multi-year panels. The vine needs at least 2 prior years to condition on; with one year it reduces to a standard credibility adjustment.
Yang, L. & Czado, C. (2022). Two-part D-vine copula models for longitudinal insurance claim data. Scandinavian Journal of Statistics, 49(4), 1534–1561.
Shi, P. & Zhao, Z. (2024). Enhanced pricing and management of bundled insurance risks with dependence-aware prediction using pair copula construction. Journal of Econometrics, 240(1), 105676.
MIT