insurance-copula

Copula models for insurance pricing — D-vine temporal dependence, two-part occurrence/severity.

Merged from: insurance-vine-longitudinal (D-vine copula for panel data).

The problem

A policyholder who claimed last year is more likely to claim again next year. This is not just adverse selection — it is genuine claim persistence. Standard GLM pricing captures risk factors (age, vehicle type, region) but ignores temporal dependence in residuals. NCD scales encode a binary rule: claimed or didn't. Neither approach gives you a principled conditional distribution.

This library implements the Yang & Czado (2022) two-part D-vine copula for longitudinal insurance claims. You observe a policyholder over T years. The model learns the full joint distribution of claim occurrence and severity across those years, then conditions on observed history to give the next-year claim distribution.

What it does

Fits a logistic GLM for claim occurrence and a gamma/log-normal GLM for severity. These strip out systematic risk factors.
Applies the probability integral transform (PIT) to the residuals — what the GLM cannot explain.
Fits a stationary D-vine copula on the occurrence PIT residuals. The vine structure is temporal: tree level k captures lag-k dependence.
Does the same for severity PIT residuals.
Uses h-function recursion to compute the conditional distribution of next year's claim given observed history.
Returns: conditional claim probability, conditional severity quantiles, experience-rated premium, relativity factors.

What it is not

This is not a neural/sequence model. It does not replace your GLM. It operates on the GLM residuals and quantifies how much temporal persistence remains after controlling for risk factors. The statistical structure is transparent and auditable — relevant for Consumer Duty documentation.

Installation

pip install insurance-copula

Quick start

import pandas as pd
from insurance_copula.vine import PanelDataset, TwoPartDVine

# Your panel data: one row per (policyholder, year)
df = pd.read_parquet("motor_panel.parquet")

# Build the panel object (validates, handles unbalanced panels)
panel = PanelDataset.from_dataframe(
    df,
    id_col="policy_id",
    year_col="year",
    claim_col="has_claim",
    severity_col="claim_amount",
    covariate_cols=["age", "vehicle_group", "region"],
)

# Fit the two-part D-vine
model = TwoPartDVine(severity_family="gamma", max_truncation=2)
model.fit(panel)

print(model)
# TwoPartDVine(fitted, t_dim=4, occurrence_p=1, severity_p=2)

# Predict next-year claim probability given history
proba = model.predict_proba(history_df)
# policy_id
# POL00001    0.142
# POL00002    0.089
# POL00003    0.247
# Name: claim_proba, dtype: float64

# Conditional severity quantiles
quantiles = model.predict_severity_quantile(history_df, quantiles=[0.5, 0.95])

# Experience-rated premium
premium = model.predict_premium(history_df, loading=0.15)

# Experience relativity = copula premium / a priori GLM premium
relativity = model.experience_relativity(history_df)

Top-level imports also work:

from insurance_copula import PanelDataset, TwoPartDVine, extract_relativity_curve

Relativity table

The output pricing teams actually use: how does claim history shift the predicted premium relative to the a priori estimate?

from insurance_copula import extract_relativity_curve, compare_to_ncd

curve = extract_relativity_curve(
    model,
    claim_counts=[0, 1, 2, 3],
    n_years_list=[1, 2, 3, 4, 5],
)
print(curve.pivot(index="claim_count", columns="n_years", values="relativity").round(3))

#              1yr   2yr   3yr   4yr   5yr
# 0 claims    1.00  1.00  1.00  1.00  1.00
# 1 claim     1.35  1.28  1.22  1.18  1.14
# 2 claims    NaN   1.71  1.58  1.48  1.40
# 3 claims    NaN   NaN   2.01  1.87  1.74

# Compare against NCD scale
comparison = compare_to_ncd(curve)
print(comparison[comparison["claim_count"] == 0].to_string())

Truncation and Markov order

The D-vine is truncated at order p, selected by BIC. At p=1, the model is a first-order Markov chain: only the most recent year matters after conditioning on covariates. At p=2, the last two years matter. For UK motor data, p=1 or p=2 is typical.

print(model.occurrence_vine.truncation_level)   # e.g., 1
print(model.occurrence_vine.fit_result_.bic_by_level)
# {1: 4821.3, 2: 4832.1}  → p=1 selected

FCA Consumer Duty context

Post PS21-5 (2022), renewal pricing must be fair. A D-vine model gives an auditable conditional distribution, separating genuine claim persistence (legitimate risk signal) from premium optimisation targeting (what the FCA is policing). The relativity table above is directly documentable.

Performance

Benchmarked against NCD flat adjustment (Poisson GLM + fixed step function: 0 claims = 0.55×, 1 claim = 0.75×, 2+ claims = 1.30×) on a synthetic panel of 5,000 policyholders over 3 years with a known latent frailty DGP. Oracle predictions (exact Gamma-Poisson posterior) serve as an upper bound. Full notebook: notebooks/benchmark.py.

Metric	NCD Baseline	D-vine Copula	Oracle
Out-of-sample log-likelihood	lower	higher	highest
Brier score	higher	lower	lowest
MAE (predicted probability vs outcome)	higher	lower	lowest
Recency sensitivity (year-1 vs year-2 claim)	none	captures it	captures it

The benchmark tests the core weakness of NCD: a policyholder who claimed in year 1 only receives the same multiplier as one who claimed in year 2 only, even though the DGP makes recency matter. The D-vine conditions on the full sequence and assigns higher probability to a recent claim. The notebook also reports calibration (A/E by NCD band) and the fraction of oracle improvement that the vine captures over NCD.

When to use: You have a panel of 3+ years of policyholder history and want experience-rated renewal pricing that goes beyond NCD steps — particularly where claim recency, not just count, matters.

When NOT to use: You have only one year of history per policyholder, or your book turns over too rapidly to build meaningful multi-year panels. The vine needs at least 2 prior years to condition on; with one year it reduces to a standard credibility adjustment.

References

Yang, L. & Czado, C. (2022). Two-part D-vine copula models for longitudinal insurance claim data. Scandinavian Journal of Statistics, 49(4), 1534–1561.

Shi, P. & Zhao, Z. (2024). Enhanced pricing and management of bundled insurance risks with dependence-aware prediction using pair copula construction. Journal of Econometrics, 240(1), 105676.

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
notebooks		notebooks
src/insurance_copula		src/insurance_copula
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

insurance-copula

The problem

What it does

What it is not

Installation

Quick start

Relativity table

Truncation and Markov order

FCA Consumer Duty context

Performance

References

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

insurance-copula

The problem

What it does

What it is not

Installation

Quick start

Relativity table

Truncation and Markov order

FCA Consumer Duty context

Performance

References

Licence

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages