# Introduction to Buhlmann credibility

In this post, we continue our discussion in credibility theory. Suppose that for a particular insured (either an individual entity or a group of insureds), we have observed data $X_1,X_2, \cdots, X_n$ (the numbers of claims or loss amounts). We are interested in setting a rate to cover the claim experience $X_{n+1}$ from the next period. In two previous posts (Examples of Bayesian prediction in insurance, Examples of Bayesian prediction in insurance-continued), we discussed this estimation problem from a Bayesian perspective and presented two examples. In this post, we discuss the Buhlmann credibility model and work the same two examples using the Buhlmann method.

First, let’s further describe the setting of the problem. For a particular insured, the experience data corresponding to various exposure periods are assumed to be independent. Statistically speaking, conditional on a risk parameter $\Theta$, the claim numbers or loss amounts $X_1, \cdots, X_n,X_{n+1}$ are independent and identically distributed. Furthermore, the distribution of the risk characteristics in the population of insureds and potential insureds is represented by $\pi_{\Theta}(\theta)$. The experience (either claim numbers or loss amounts) of a particular insured with risk parameter $\Theta=\theta$ is modeled by the conditional distribution $f_{X \lvert \Theta}(x \lvert \theta)$ given $\Theta=\theta$.

The Buhlmann Credibility Estimator
Given the observations $X_1, \cdots, X_n$ in the prior exposure periods, the Buhlmann credibility estimate $C$ of the claim experience $X_{n+1}$ is

$\displaystyle C=Z \overline{X}+(1-Z)\mu$

where $Z$ is the credibility factor assigned to the observed experience data and $\mu$ is the unconditional mean $E[X]$ (the mean taken over all members of the risk parameter $\Theta$). The credibility factor $Z$ is of the form $\displaystyle Z=\frac{n}{n+K}$ where $n$ is a measure of the exposure size (it is the number of observation periods in our examples) and $\displaystyle K=\frac{E[Var[X \lvert \Theta]]}{Var[E[X \lvert \Theta]]}$. The parameter $K$ will be further explained below.

The Buhlmann credibility estimator $C$ is a linear function of the past data. Note that it is of the form:

$\displaystyle C=Z \overline{X}+(1-Z)\mu=w_0+\sum \limits_{i=1}^{n} w_i X_i$

where $w_0=(1-Z)\mu$ and $\displaystyle w_i=\frac{Z}{n}$ for $i=1, \cdots, n$.

Not only is the Buhlmann credibility estimator a linear estimator, it is the best linear estimator to the Bayesian predictive mean $E[X_{n+1} \lvert X_1, \cdots, X_n]$ and the hypothetical mean $E[X_{n+1} \lvert \Theta]$ in terms of minimizing squared error loss. In other words, the coefficients $w_i$ are obtained in such a way that the following expectations (loss functions) are minimized where the expectations are taken over all observations and/or $\Theta$ (see [1]):

$\displaystyle L_1=E\biggl( \biggl[E[X_{n+1} \lvert \Theta]-w_0-\sum \limits_{i=1}^{n} w_i X_i \biggr]^2 \biggr)$

$\displaystyle L_2=E\biggl( \biggl[E[X_{n+1} \lvert X_1, \cdots, X_n]-w_0-\sum \limits_{i=1}^{n} w_i X_i \biggr]^2 \biggr)$

The Buhlmann Method
As discussed above, the Buhlmann credibility factor $Z=\frac{n}{n+K}$ is chosen such that $C=Z \overline{X}+(1-Z) \mu$ is the best linear approximation to the Bayesian estimate of the next period’s claim experience. Now we focus on the calculation of the parameter $K$.

Conditional on the risk parameter $\Theta$, $E[X \lvert \Theta]$ is called the hypothetical mean and $Var[X \lvert \Theta]$ is called the process variance. Then $\mu=E[X]=E[E[X \lvert \Theta]]$ is the expected value of hypothetical means (the unconditional mean). The total variance of this random process is:

$\displaystyle Var[X]=E[Var[X \lvert \Theta]]+Var[E[X \lvert \Theta]]$

The first part of the total variance $E[Var[X \lvert \Theta]]$ is called the expected value of process variance (EPV) and the second part $Var[E[X \lvert \Theta]]$ is called the variance of the hypothetical means (VHM). The parameter $K$ in the Buhlmann method is simply the ratio $K=\frac{EPV}{VHM}$.

We can get an intuitive feel of this formula by considering the variability of the hypothetical means $E[X \lvert \Theta]$ across many values of the risk parameter $\Theta$. If the entire population of insureds (and potential insureds) is fairly homogeneous with respect to the risk parameter $\Theta$, then $VHM=Var[E[X \lvert \Theta]]$ does not vary a great deal and is relatively small in relation to $EPV=E[Var[X \lvert \Theta]]$. As a result, $K$ is large and $Z$ is closer to 0. This agrees with the notion that in a homogeneous population, the unconditional mean (the overall mean) is of more value as a predictor of the next period’s claim experience. On the other hand, if the population of insureds is heterogeneous with respect to the risk parameter $\Theta$, then the overall mean is of less value as a predictor of future experience and we should reply more on the experience of the particular insured. Again, the Buhlmann formula agrees with this notion. If $VHM=Var[E[X \lvert \Theta]]$ is large relative to $EPV=E[Var[X \lvert \Theta]]$, then $K$ is small and $Z$ is closer to 1.

Another attractive feature of the Buhlmann formula is that as more experience data accumulate (as $n \rightarrow \infty$), the credibility factor $Z$ approaches 1 (the experience data become more and more credible).

Example 1
In this random experiment, there are a big bowl (called B) and two boxes (Box 1 and Box 2). Bowl B consists of a large quantity of balls, 80% of which are white and 20% of which are red. In Box 1, 60% of the balls are labeled 0, 30% are labeled 1 and 10% are labeled 2. In Box 2, 15% of the balls are labeled 0, 35% are labeled 1 and 50% are labeled 2. In the experiment, a ball is selected at random from bowl B. The color of the selected ball from bowl B determines which box to use (if the ball is white, then use Box 1, if red, use Box 2). Then balls are drawn at random from the selected box (Box $i$) repeatedly with replacement and the values of the series of selected balls are recorded. The value of first selected ball is $X_1$, the value of the second selected ball is $X_2$ and so on.

Suppose that your friend performs this random experiment (you do not know whether he uses Box 1 or Box 2) and that his first selected ball is a 1 ($X_1=1$) and his second selected ball is a 2 ($X_2=2$). What is the predicted value $X_3$ of the third selected ball?

This example was solved in (Examples of Bayesian prediction in insurance) using the Bayesian approach. We now work this example in the Buhlmann approach.

The following restates the prior distribution of $\Theta$ and the conditional distribution of $X \lvert \Theta$. We denote “white ball from bowl B” by $\Theta=1$ and “red ball from bowl B” by $\Theta=2$.

$\pi_{\Theta}(1)=0.8$
$\pi_{\Theta}(2)=0.2$

$\displaystyle f_{X \lvert \Theta}(0 \lvert \Theta=1)=0.60$
$\displaystyle f_{X \lvert \Theta}(1 \lvert \Theta=1)=0.30$
$\displaystyle f_{X \lvert \Theta}(2 \lvert \Theta=1)=0.10$

$\displaystyle f_{X \lvert \Theta}(0 \lvert \Theta=2)=0.15$
$\displaystyle f_{X \lvert \Theta}(1 \lvert \Theta=2)=0.35$
$\displaystyle f_{X \lvert \Theta}(2 \lvert \Theta=2)=0.50$

The following computes the conditional means (hypothetical means) and conditional variances (process variances) and the other parameters of the Buhlmann method.

Hypothetical Means
$\displaystyle E[X \lvert \Theta=1]=0.60(0)+0.30(1)+0.10(2)=0.50$
$\displaystyle E[X \lvert \Theta=2]=0.15(0)+0.35(1)+0.50(2)=1.35$

$\displaystyle E[X^2 \lvert \Theta=1]=0.60(0)+0.30(1)+0.10(4)=0.70$
$\displaystyle E[X^2 \lvert \Theta=2]=0.15(0)+0.35(1)+0.50(4)=2.35$

Process Variances
$\displaystyle Var[X \lvert \Theta=1]=0.70-0.50^2=0.45$
$\displaystyle Var[X \lvert \Theta=2]=2.35-1.35^2=0.5275$

Expected Value of the Hypothetical Means
$\displaystyle \mu=E[X]=E[E[X \lvert \Theta]]=0.80(0.50)+0.20(1.35)=0.67$

Expected Value of the Process Variance
$\displaystyle EPV=E[Var[X \lvert \Theta]]=0.8(0.45)+0.20(0.5275)=0.4655$

Variance of the Hypothetical Means
$\displaystyle VHM=Var[E[X \lvert \Theta]]=0.80(0.50)^2+0.20(1.35)^2-0.67^2=0.1156$

Buhlmann Credibility Factor
$\displaystyle K=\frac{4655}{1156}$

$\displaystyle Z=\frac{2}{2+\frac{4655}{1156}}=\frac{2312}{6967}=0.33185$

Buhlmann Credibility Estimate
$\displaystyle C=\frac{2312}{6967} \frac{3}{2}+\frac{4655}{6967} (0.67)=\frac{6586.85}{6967}=0.9454356$

Note that the Bayesian estimate obtained in Examples of Bayesian prediction in insurance is 1.004237288. Under the Buhlmann model, the past claim experience of the insured in this example is assigned 33% weight in projecting the claim frequency in the next period.

Example 2
The number of claims $X$ generated by an insured in a potfolio of independent insurance policies has a Poisson distribution with parameter $\Theta$. In the portfolio of policies, the parameter $\Theta$ varies according to a gamma distribution with parameters $\alpha$ and $\beta$. We have the following conditional distributions of $X$ and distribution of the risk parameter $\Theta$.

$\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\frac{\theta^x e^{-\theta}}{x!}$ where $x=0,1,2, \cdots$

$\displaystyle \pi_{\Theta}(\theta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta}$ where $\Gamma(\cdot)$ is the gamma function.

Suppose that a particular insured in this portfolio has generated 0 and 3 claims in the first 2 policy periods. What is the Buhlmann estimate of the number of claims for this insured in period 3?

Since the conditional distribution of $X$ is Poisson, we have $E[X \lvert \Theta]=\Theta$ and $Var[X \lvert \Theta]=\Theta$. As a result, the $EPV$, $VHM$ and $K$ are:

$\displaystyle EPV=E[\Theta]=\frac{\alpha}{\beta}$

$\displaystyle VHM=Var[\Theta]=\frac{\alpha}{\beta^2}$

$\displaystyle K=\frac{EPV}{VHM}=\beta$

As a result, the credibility factor for a 2-period experience period is $Z=\frac{2}{2+\beta}$ and the Buhlmann estimate of the claim frequency in the next period is:

$\displaystyle C=\frac{2}{2+\beta} \thinspace \biggl(\frac{3}{2}\biggr)+\frac{\beta}{2+\beta} \thinspace \biggl(\frac{\alpha}{\beta}\biggr)$

To generalize the above results, suppose that we have observed $X_1=x_1, \cdots, X_n=x_n$ for this insured in the prior periods. Then the Buhlmann estimate for the claim frequency in the next period is:

$\displaystyle C=\frac{n}{n+\beta} \thinspace \biggl(\frac{\sum \limits_{i=1}^{n}x_i}{n}\biggr)+\frac{\beta}{n+\beta} \thinspace \biggl(\frac{\alpha}{\beta}\biggr)$

In this example, the Buhlmann estimate is exactly the same as the Bayesian estimate (Examples of Bayesian prediction in insurance-continued).

Reference

1. Klugman S. A., Panjer H. H., Willmot G. E., Loss Models, From Data To Decisions, Second Edition, 2004, John Wiley & Sons, Inc.