# Defining Hazard Rate at a Point Mass

The hazard rate function $h_T(t)$, also known as the force of mortality or the failure rate, is defined as the ratio of the density function and the survival function. That is, $\displaystyle h_T(t)=\frac{f_T(t)}{S_T(t)}$, where $T$ is the survival model of a life or a system being studied. In this definition, $T$ is usually taken as a continuous random variable with nonnegative real values as support. In this post we attempt to define the hazard rate at the places that are point masses (probability masses). This definition will cover discrete survival models as well as mixed survival models (i.e. models that are continuous in some interval and also have point masses). This post is in reponse to one comment posted by a reader. The comment is in response to the post The hazard rate function, an introduction
.

If the suvival model $T$ is an exponential distribution, the hazard rate is constant. When the exponential survival model is censored on the right at some value of maximum lifetime, what is the hazard rate at the maximum? This is essentially the question posted by one reader of this blog. The following is the graph of the cdf $F_T(t)=1-e^{-0.25 t}$ censored at $t_{max}=5$.

We attempt to define the hazard at a probablity mass such as the one in Figure 1. The same definition woulod apply for any discrete probability model.

As indicated at the beginning of the post, the hazard rate function is defined as the following ratio:

$\displaystyle (1) \ \ \ \ \ h_T(t)=\frac{f_T(t)}{1-F_T(t)}=\frac{f_T(t)}{S_T(t)}$

where $f_T$, $F_T$ and $S_T$ are the density function, cumulative distribution function (cdf) and the survival function of a given survival model $T$. This definition is usually made at the points $T=t$ where it makes sense to take derivative of $F_T(t)$. The hazard rate thus defined can be interpreted as the failure rate at time $t$ given that the life in question has survived to time $t$. It is the rate of failure at the next instant given that the life has survived up to time $t$.

Suppose that $T=t$ is a point mass (such as $T=5$ in Figure 1). The hazard rate at such points is defined by the same idea. We define the hazard rate at a point mass as the probability of failing at time $t$ given that the life has survived up to that time.

$\displaystyle (2) \ \ \ \ \ h_T(t)=\frac{P(T=t)}{P(T \ge t)}$

Note that both $(1)$ and $(2)$ are of the same general form (the ratio of density to suvival function) and have the same interpretation. However, $(2)$ is actually a conditional probability, while $(1)$ can only be a rate of failure. The hazard rate as in $(1)$ technically cannot be a probability since it can be greater than 1.

The hazard rate at $T=5$ in Figure 1 is 1.0. We can derive this using $(2)$, or we can think about the meaning of $(2)$. Note that the point mass in Figure 1 is the maximum lifetime. Any life reaches that point is considered a termination (perhaps the person drops out of the study). So given that the life reaches this maximum point, it is certain that the life fails at this point (hence the conditional probability as defined by $(2)$ is 1.0).

So if the point mass is at the last point of the time scale in the surviva model, the hazard rate is 1.0, representing that 100% of the survived lives die off. However, the hazard rate at a point mass at $T=t$ prior to the maximum point is less than 1.0 and is the size of the jump in the cdf at $T=t$ as a fraction of the probability of survival up to that point.

We close with a simple example illustrating the calculation of hazard rate for discrete survival model. Our example is the uniform model $T$ at $t=1,2,3,4,5$. The following is the graph of its cdf.

The following table defines the hazard rates.

$\displaystyle (3) \ \ \ \ \ \begin{bmatrix} \text{t}&\text{ }&P(T=t) &\text{ }&P(T \ge t) &\text{ }&h_T(t) \\\text{ }&\text{ }&\text{ } \\ 1&\text{ }&\displaystyle \frac{1}{5}&\text{ }& \displaystyle \frac{5}{5}&\text{ }& \displaystyle \frac{1}{5} \\\text{ }&\text{ }&\text{ } \\ 2&\text{ }& \displaystyle \frac{1}{5}&\text{ }& \displaystyle \frac{4}{5}&\text{ }& \displaystyle \frac{1}{4} \\\text{ }&\text{ }&\text{ } \\ 3&\text{ }& \displaystyle \frac{1}{5}&\text{ }& \displaystyle \frac{3}{5}&\text{ }& \displaystyle \frac{1}{3} \\\text{ }&\text{ }&\text{ } \\ 4&\text{ }& \displaystyle \frac{1}{5}&\text{ }& \displaystyle \frac{2}{5}&\text{ }& \displaystyle \frac{1}{2} \\\text{ }&\text{ }&\text{ } \\ 5&\text{ }& \displaystyle \frac{1}{5}&\text{ }& \displaystyle \frac{1}{5}&\text{ }&1 \end{bmatrix}$

The hazard rates in the above table are calculated using $(2)$. We would like to point out that the calculated hazard rates conform to the mortality pattern that is expected in a uniform model. Note that at the first point mass, one fifth of the lives die off. At the second point mass, one fourth of the survived die off and so on. Then at the last point mass, 100% of the survived die off.

# The hazard rate function, an introduction

The goal of this post is to introduce the concept of hazard rate function by modifying one of the postulates of the approximate Poisson process. The rate of changes in the modified process is the hazard rate function. When a “change” in the modified Poisson process means a termination of a system (be it manufactured or biological), the notion of the hazard rate function leads to the concept of survival models. We then discuss several important examples of survival probability models that are defined by the hazard rate function. These examples include the Weibull distribution, the Gompertz distribution and the model based on the Makeham’s law.

We consider an experiment in which the occurrences of a certain type of events are counted during a given time interval or on a given physical object. Suppose that we count the occurrences of events on the interval $(0,t)$. We call the occurrence of the type of events in question a change. We assume the following three conditions:

1. The numbers of changes occurring in nonoverlapping intervals are independent.
2. The probability of two or more changes taking place in a sufficiently small interval is essentially zero.
3. The probability of exactly one change in the short interval $(t,t+\delta)$ is approximately $\lambda(t) \delta$ where $\delta$ is sufficiently small and $\lambda(t)$ is a nonnegative function of $t$.

For the lack of a better name, throughout this post, we call the above process the counting process (*). The approximate Poisson process is defined by conditions 1 and 2 and the condition that the $\lambda(t)$ in condition 3 is a constant function. Thus the process we describe here is a more general process than the Poisson process.

Though the counting process indicated here can model the number of changes occurred in a physical object or a physical interval, we focus on the time aspect by considering the counting process as models for the number of changes occurred in a time interval where a change means “termination” or ‘failure” of a system under consideration. In many applications (e.g. in actuarial science and reliability engineering), the interest is on the time until termination or failure. Thus, the distribution for the time until failure is called a survival model. The rate of change function $\lambda(t)$ indicated in condition 3 is called the hazard rate function. It is also called the failure rate function in reliability engineering. In actuarial science, the hazard rate function is known as the force of mortality.

Two random variables naturally arise from the counting process (*). One is the discrete variable $N_t$, defined as the number of changes in the time interval $(0,t)$. The other is the continuous random variable $T$, defined as the time until the occurrence of the first (or next) change.

Claim 1. Let $\displaystyle \Lambda(t)=\int_{0}^{t} \lambda(y) dy$. Then $e^{-\Lambda(t)}$ is the probability that there is no change in the interval $(0,t)$. That is, $\displaystyle P[N_t=0]=e^{-\Lambda(t)}$.

We are interested in finding the probability of zero changes in the interval $(0,y+\delta)$. By condition 1, the numbers of changes in the nonoverlapping intervals $(0,y)$ and $(y,y+\delta)$ are independent. Thus we have:

$\displaystyle P[N_{y+\delta}=0] \approx P[N_y=0] \times [1-\lambda(y) \delta] \ \ \ \ \ \ \ \ (a)$

Note that by condition 3, the probability of exactly one change in the small interval $(y,y+\delta)$ is $\lambda(y) \delta$. Thus $[1-\lambda(y) \delta]$ is the probability of no change in the interval $(y,y+\delta)$. Continuing with equation $(a)$, we have the following derivation:

$\displaystyle \frac{P[N_{y+\delta}=0] - P[N_y=0]}{\delta} \approx -\lambda(y) P[N_y=0]$

$\displaystyle \frac{d}{dy} P[N_y=0]=-\lambda(y) P[N_y=0]$

$\displaystyle \frac{\frac{d}{dy} P[N_y=0]}{P[N_y=0]}=-\lambda(y)$

$\displaystyle \int_0^{t} \frac{\frac{d}{dy} P[N_y=0]}{P[N_y=0]} dy=-\int_0^{t} \lambda(y)dy$

Integrating the left hand side and using the boundary condition of $P[N_0=0]=1$, we have:

$\displaystyle ln P[N_t=0]=-\int_0^{t} \lambda(y)dy$

$\displaystyle P[N_t=0]=e^{-\int_0^{t} \lambda(y)dy}$

Claim 2
As discussed above, let $T$ be the length of the interval that is required to observe the first change in the counting process (*). Then the following are the distribution function, survival function and pdf of $T$:

• $\displaystyle F_T(t)=\displaystyle 1-e^{-\int_0^t \lambda(y) dy}$
• $\displaystyle S_T(t)=\displaystyle e^{-\int_0^t \lambda(y) dy}$
• $\displaystyle f_T(t)=\displaystyle \lambda(t) e^{-\int_0^t \lambda(y) dy}$

In Claim 1, we derive the probability $P[N_y=0]$ for the discrete variable $N_y$ derived from the counting process (*). We now consider the continuous random variable $T$. Note that $P[T > t]$ is the probability that the first change occurs after time $t$. This means there is no change within the interval $(0,t)$. Thus $S_T(t)=P[T > t]=P[N_t=0]=e^{-\int_0^t \lambda(y) dy}$. The distribution function and density function can be derived accordingly.

Claim 3
The hazard rate function $\lambda(t)$ is equivalent to each of the following:

• $\displaystyle \lambda(t)=\frac{f_T(t)}{1-F_T(t)}$
• $\displaystyle \lambda(t)=\frac{-S_T^{'}(t)}{S_T(t)}$

Remark
Based on the condition 3 in the counting process (*), the $\lambda(t)$ is the rate of change in the counting process. Note that $\lambda(t) \delta$ is the probability of a change (e.g. a failure or a termination) in a small time interval of length $\delta$. Thus the hazard rate function can be interpreted as the failure rate at time $t$ given that the life in question has survived to time $t$. Claim 3 shows that the hazard rate function is the ratio of the density function and the survival function of the time until failure variable $T$. Thus the hazard rate function $\lambda(t)$ is the conditional density of failure at time $t$. It is the rate of failure at the next instant given that the life or system being studied has survived up to time $t$.

It is interesting to note that the function $\Lambda(t)=\int_0^t \lambda(y) dy$ defined in claim 1 is called the cumulative hazard rate function. Thus the cumulative hazard rate function is an alternative way of representing the hazard rate function (see the discussion on Weibull distribution below).

Examples of Survival Models

Exponential Distribution
In many applications, especially those for biological organisms and mechanical systems that wear out over time, the hazard rate $\lambda(t)$ is an increasing function of $t$. In other words, the older the life in question (the larger the $t$), the higher chance of failure at the next instant. For humans, the probability of a 85 years old dying in the next year is clearly higher than for a 20 years old. In a Poisson process, the rate of change $\lambda(t)=\lambda$ indicated in condition 3 is a constant. As a result, the time $T$ until the first change derived in claim 2 has an exponential distribution with parameter $\lambda$. In terms of mortality study or reliability study of machines that wear out over time, this is not a realistic model. However, if the mortality or failure is caused by random external events, this could be an appropriate model.

Weibull Distribution
This distribution is an excellent model choice for describing the life of manufactured objects. It is defined by the following cumulative hazard rate function:

$\displaystyle \Lambda(t)=\biggl(\frac{t}{\beta}\biggr)^{\alpha}$ where $\alpha > 0$ and $\beta>0$

As a result, the hazard rate function, the density function and the survival function for the lifetime distribution are:

$\displaystyle \lambda(t)=\frac{\alpha}{\beta} \biggl(\frac{t}{\beta}\biggr)^{\alpha-1}$

$\displaystyle f_T(t)=\frac{\alpha}{\beta} \biggl(\frac{t}{\beta}\biggr)^{\alpha-1} \displaystyle e^{\displaystyle -\biggl[\frac{t}{\beta}\biggr]^{\alpha}}$

$\displaystyle S_T(t)=\displaystyle e^{\displaystyle -\biggl[\frac{t}{\beta}\biggr]^{\alpha}}$

The parameter $\alpha$ is the shape parameter and $\beta$ is the scale parameter. When $\alpha=1$, the hazard rate becomes a constant and the Weibull distribution becomes an exponential distribution.

When the parameter $\alpha<1$, the failure rate decreases over time. One interpretation is that most of the defective items fail early on in the life cycle. Once they they are removed from the population, failure rate decreases over time.

When the parameter $1<\alpha$, the failure rate increases with time. This is a good candidate for a model to describe the lifetime of machines or systems that wear out over time.

The Gompertz Distribution
The Gompertz law states that the force of mortality or failure rate increases exponentially over time. It describe human mortality quite accurately. The following is the hazard rate function:

$\displaystyle \lambda(t)=\alpha e^{\beta t}$ where $\alpha>0$ and $\beta>0$.

The following are the cumulative hazard rate function as well as the survival function, distribution function and the pdf of the lifetime distribution $T$.

$\displaystyle \Lambda(t)=\int_0^t \alpha e^{\beta y} dy=\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}$

$\displaystyle S_T(t)=\displaystyle e^{\displaystyle \frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}}$

$\displaystyle F_T(t)=\displaystyle 1-e^{\displaystyle \frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}}$

$\displaystyle f_T(t)=\displaystyle \alpha e^{\beta t} \thinspace e^{\displaystyle \frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}}$

Makeham’s Law
The Makeham’s Law states that the force of mortality is the Gompertz failure rate plus an age-indpendent component that accounts for external causes of mortality. The following is the hazard rate function:

$\displaystyle \lambda(t)=\alpha e^{\beta t}+\mu$ where $\alpha>0$, $\beta>0$ and $\mu>0$.

The following are the cumulative hazard rate function as well as the survival function, distribution function and the pdf of the lifetime distribution $T$.

$\displaystyle \Lambda(t)=\int_0^t (\alpha e^{\beta y}+\mu) dy=\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t$

$\displaystyle S_T(t)=\displaystyle e^{\displaystyle \frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t}$

$\displaystyle F_T(t)=\displaystyle 1-e^{\displaystyle \frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t}$

$\displaystyle f_T(t)=\biggl( \alpha e^{\beta t}+\mu t \biggr) \thinspace e^{\displaystyle \frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t}$

# Introduction to Buhlmann credibility

In this post, we continue our discussion in credibility theory. Suppose that for a particular insured (either an individual entity or a group of insureds), we have observed data $X_1,X_2, \cdots, X_n$ (the numbers of claims or loss amounts). We are interested in setting a rate to cover the claim experience $X_{n+1}$ from the next period. In two previous posts (Examples of Bayesian prediction in insurance, Examples of Bayesian prediction in insurance-continued), we discussed this estimation problem from a Bayesian perspective and presented two examples. In this post, we discuss the Buhlmann credibility model and work the same two examples using the Buhlmann method.

First, let’s further describe the setting of the problem. For a particular insured, the experience data corresponding to various exposure periods are assumed to be independent. Statistically speaking, conditional on a risk parameter $\Theta$, the claim numbers or loss amounts $X_1, \cdots, X_n,X_{n+1}$ are independent and identically distributed. Furthermore, the distribution of the risk characteristics in the population of insureds and potential insureds is represented by $\pi_{\Theta}(\theta)$. The experience (either claim numbers or loss amounts) of a particular insured with risk parameter $\Theta=\theta$ is modeled by the conditional distribution $f_{X \lvert \Theta}(x \lvert \theta)$ given $\Theta=\theta$.

The Buhlmann Credibility Estimator
Given the observations $X_1, \cdots, X_n$ in the prior exposure periods, the Buhlmann credibility estimate $C$ of the claim experience $X_{n+1}$ is

$\displaystyle C=Z \overline{X}+(1-Z)\mu$

where $Z$ is the credibility factor assigned to the observed experience data and $\mu$ is the unconditional mean $E[X]$ (the mean taken over all members of the risk parameter $\Theta$). The credibility factor $Z$ is of the form $\displaystyle Z=\frac{n}{n+K}$ where $n$ is a measure of the exposure size (it is the number of observation periods in our examples) and $\displaystyle K=\frac{E[Var[X \lvert \Theta]]}{Var[E[X \lvert \Theta]]}$. The parameter $K$ will be further explained below.

The Buhlmann credibility estimator $C$ is a linear function of the past data. Note that it is of the form:

$\displaystyle C=Z \overline{X}+(1-Z)\mu=w_0+\sum \limits_{i=1}^{n} w_i X_i$

where $w_0=(1-Z)\mu$ and $\displaystyle w_i=\frac{Z}{n}$ for $i=1, \cdots, n$.

Not only is the Buhlmann credibility estimator a linear estimator, it is the best linear estimator to the Bayesian predictive mean $E[X_{n+1} \lvert X_1, \cdots, X_n]$ and the hypothetical mean $E[X_{n+1} \lvert \Theta]$ in terms of minimizing squared error loss. In other words, the coefficients $w_i$ are obtained in such a way that the following expectations (loss functions) are minimized where the expectations are taken over all observations and/or $\Theta$ (see [1]):

$\displaystyle L_1=E\biggl( \biggl[E[X_{n+1} \lvert \Theta]-w_0-\sum \limits_{i=1}^{n} w_i X_i \biggr]^2 \biggr)$

$\displaystyle L_2=E\biggl( \biggl[E[X_{n+1} \lvert X_1, \cdots, X_n]-w_0-\sum \limits_{i=1}^{n} w_i X_i \biggr]^2 \biggr)$

The Buhlmann Method
As discussed above, the Buhlmann credibility factor $Z=\frac{n}{n+K}$ is chosen such that $C=Z \overline{X}+(1-Z) \mu$ is the best linear approximation to the Bayesian estimate of the next period’s claim experience. Now we focus on the calculation of the parameter $K$.

Conditional on the risk parameter $\Theta$, $E[X \lvert \Theta]$ is called the hypothetical mean and $Var[X \lvert \Theta]$ is called the process variance. Then $\mu=E[X]=E[E[X \lvert \Theta]]$ is the expected value of hypothetical means (the unconditional mean). The total variance of this random process is:

$\displaystyle Var[X]=E[Var[X \lvert \Theta]]+Var[E[X \lvert \Theta]]$

The first part of the total variance $E[Var[X \lvert \Theta]]$ is called the expected value of process variance (EPV) and the second part $Var[E[X \lvert \Theta]]$ is called the variance of the hypothetical means (VHM). The parameter $K$ in the Buhlmann method is simply the ratio $K=\frac{EPV}{VHM}$.

We can get an intuitive feel of this formula by considering the variability of the hypothetical means $E[X \lvert \Theta]$ across many values of the risk parameter $\Theta$. If the entire population of insureds (and potential insureds) is fairly homogeneous with respect to the risk parameter $\Theta$, then $VHM=Var[E[X \lvert \Theta]]$ does not vary a great deal and is relatively small in relation to $EPV=E[Var[X \lvert \Theta]]$. As a result, $K$ is large and $Z$ is closer to 0. This agrees with the notion that in a homogeneous population, the unconditional mean (the overall mean) is of more value as a predictor of the next period’s claim experience. On the other hand, if the population of insureds is heterogeneous with respect to the risk parameter $\Theta$, then the overall mean is of less value as a predictor of future experience and we should reply more on the experience of the particular insured. Again, the Buhlmann formula agrees with this notion. If $VHM=Var[E[X \lvert \Theta]]$ is large relative to $EPV=E[Var[X \lvert \Theta]]$, then $K$ is small and $Z$ is closer to 1.

Another attractive feature of the Buhlmann formula is that as more experience data accumulate (as $n \rightarrow \infty$), the credibility factor $Z$ approaches 1 (the experience data become more and more credible).

Example 1
In this random experiment, there are a big bowl (called B) and two boxes (Box 1 and Box 2). Bowl B consists of a large quantity of balls, 80% of which are white and 20% of which are red. In Box 1, 60% of the balls are labeled 0, 30% are labeled 1 and 10% are labeled 2. In Box 2, 15% of the balls are labeled 0, 35% are labeled 1 and 50% are labeled 2. In the experiment, a ball is selected at random from bowl B. The color of the selected ball from bowl B determines which box to use (if the ball is white, then use Box 1, if red, use Box 2). Then balls are drawn at random from the selected box (Box $i$) repeatedly with replacement and the values of the series of selected balls are recorded. The value of first selected ball is $X_1$, the value of the second selected ball is $X_2$ and so on.

Suppose that your friend performs this random experiment (you do not know whether he uses Box 1 or Box 2) and that his first selected ball is a 1 ($X_1=1$) and his second selected ball is a 2 ($X_2=2$). What is the predicted value $X_3$ of the third selected ball?

This example was solved in (Examples of Bayesian prediction in insurance) using the Bayesian approach. We now work this example in the Buhlmann approach.

The following restates the prior distribution of $\Theta$ and the conditional distribution of $X \lvert \Theta$. We denote “white ball from bowl B” by $\Theta=1$ and “red ball from bowl B” by $\Theta=2$.

$\pi_{\Theta}(1)=0.8$
$\pi_{\Theta}(2)=0.2$

$\displaystyle f_{X \lvert \Theta}(0 \lvert \Theta=1)=0.60$
$\displaystyle f_{X \lvert \Theta}(1 \lvert \Theta=1)=0.30$
$\displaystyle f_{X \lvert \Theta}(2 \lvert \Theta=1)=0.10$

$\displaystyle f_{X \lvert \Theta}(0 \lvert \Theta=2)=0.15$
$\displaystyle f_{X \lvert \Theta}(1 \lvert \Theta=2)=0.35$
$\displaystyle f_{X \lvert \Theta}(2 \lvert \Theta=2)=0.50$

The following computes the conditional means (hypothetical means) and conditional variances (process variances) and the other parameters of the Buhlmann method.

Hypothetical Means
$\displaystyle E[X \lvert \Theta=1]=0.60(0)+0.30(1)+0.10(2)=0.50$
$\displaystyle E[X \lvert \Theta=2]=0.15(0)+0.35(1)+0.50(2)=1.35$

$\displaystyle E[X^2 \lvert \Theta=1]=0.60(0)+0.30(1)+0.10(4)=0.70$
$\displaystyle E[X^2 \lvert \Theta=2]=0.15(0)+0.35(1)+0.50(4)=2.35$

Process Variances
$\displaystyle Var[X \lvert \Theta=1]=0.70-0.50^2=0.45$
$\displaystyle Var[X \lvert \Theta=2]=2.35-1.35^2=0.5275$

Expected Value of the Hypothetical Means
$\displaystyle \mu=E[X]=E[E[X \lvert \Theta]]=0.80(0.50)+0.20(1.35)=0.67$

Expected Value of the Process Variance
$\displaystyle EPV=E[Var[X \lvert \Theta]]=0.8(0.45)+0.20(0.5275)=0.4655$

Variance of the Hypothetical Means
$\displaystyle VHM=Var[E[X \lvert \Theta]]=0.80(0.50)^2+0.20(1.35)^2-0.67^2=0.1156$

Buhlmann Credibility Factor
$\displaystyle K=\frac{4655}{1156}$

$\displaystyle Z=\frac{2}{2+\frac{4655}{1156}}=\frac{2312}{6967}=0.33185$

Buhlmann Credibility Estimate
$\displaystyle C=\frac{2312}{6967} \frac{3}{2}+\frac{4655}{6967} (0.67)=\frac{6586.85}{6967}=0.9454356$

Note that the Bayesian estimate obtained in Examples of Bayesian prediction in insurance is 1.004237288. Under the Buhlmann model, the past claim experience of the insured in this example is assigned 33% weight in projecting the claim frequency in the next period.

Example 2
The number of claims $X$ generated by an insured in a potfolio of independent insurance policies has a Poisson distribution with parameter $\Theta$. In the portfolio of policies, the parameter $\Theta$ varies according to a gamma distribution with parameters $\alpha$ and $\beta$. We have the following conditional distributions of $X$ and distribution of the risk parameter $\Theta$.

$\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\frac{\theta^x e^{-\theta}}{x!}$ where $x=0,1,2, \cdots$

$\displaystyle \pi_{\Theta}(\theta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta}$ where $\Gamma(\cdot)$ is the gamma function.

Suppose that a particular insured in this portfolio has generated 0 and 3 claims in the first 2 policy periods. What is the Buhlmann estimate of the number of claims for this insured in period 3?

Since the conditional distribution of $X$ is Poisson, we have $E[X \lvert \Theta]=\Theta$ and $Var[X \lvert \Theta]=\Theta$. As a result, the $EPV$, $VHM$ and $K$ are:

$\displaystyle EPV=E[\Theta]=\frac{\alpha}{\beta}$

$\displaystyle VHM=Var[\Theta]=\frac{\alpha}{\beta^2}$

$\displaystyle K=\frac{EPV}{VHM}=\beta$

As a result, the credibility factor for a 2-period experience period is $Z=\frac{2}{2+\beta}$ and the Buhlmann estimate of the claim frequency in the next period is:

$\displaystyle C=\frac{2}{2+\beta} \thinspace \biggl(\frac{3}{2}\biggr)+\frac{\beta}{2+\beta} \thinspace \biggl(\frac{\alpha}{\beta}\biggr)$

To generalize the above results, suppose that we have observed $X_1=x_1, \cdots, X_n=x_n$ for this insured in the prior periods. Then the Buhlmann estimate for the claim frequency in the next period is:

$\displaystyle C=\frac{n}{n+\beta} \thinspace \biggl(\frac{\sum \limits_{i=1}^{n}x_i}{n}\biggr)+\frac{\beta}{n+\beta} \thinspace \biggl(\frac{\alpha}{\beta}\biggr)$

In this example, the Buhlmann estimate is exactly the same as the Bayesian estimate (Examples of Bayesian prediction in insurance-continued).

Reference

1. Klugman S. A., Panjer H. H., Willmot G. E., Loss Models, From Data To Decisions, Second Edition, 2004, John Wiley & Sons, Inc.

# Examples of Bayesian prediction in insurance-continued

This post is a continuation of the previous post Examples of Bayesian prediction in insurance. We present another example as an illustration of the methodology of Bayesian estimation. The example in this post, along with the example in the previous post, serve to motivate the concept of Bayeisan credibility and Buhlmann credibility theory. So these two posts are part of an introduction to credibility theory.

Suppose $X_1, \cdots, X_n,X_{n+1}$ are independent and identically distributed conditional on $\Theta=\theta$. We denote the density function of the common distribution of $X_j$ by $f_{X \lvert \Theta}(x \lvert \theta)$. We denote the prior distribution of the risk parameter $\Theta$ by $\pi_{\Theta}(\theta)$. The following shows the steps of the Bayesian estimate of the next observation $X_{n+1}$ given $X_1, \cdots, X_n$.

The Marginal Distribution
$\displaystyle f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)=\int \limits_{\theta} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta) d \theta$

The Posterior Distribution
$\displaystyle \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)$

The Predictive Distribution
$\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} \biggl(f_{X \lvert \Theta}(x \lvert \theta)\biggr) \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) d \theta$

The Bayesian Predictive Mean of the Next Period
$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{x} x \thinspace f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n) dx$

$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} E[X \lvert \theta] \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) d \theta$

Example 2
The number of claims $X$ generated by an insured in a potfolio of independent insurance policies has a Poisson distribution with parameter $\Theta$. In the portfolio of policies, the parameter $\Theta$ varies according to a gamma distribution with parameters $\alpha$ and $\beta$. We have the following conditional distributions of $X$ and prior distribution of $\theta$.

$\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\frac{\theta^x e^{-\theta}}{x!}$ where $x=0,1,2, \cdots$

$\displaystyle \pi_{\Theta}(\theta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta}$ where $\Gamma(\cdot)$ is the gamma function.

Suppose that a particular insured in this portfolio has generated 0 and 3 claims in the first 2 policy periods. What is the Bayesian estimate of the number of claims for this insured in period 3?

Note that the conditional mean $E[X \lvert \Theta]=\Theta$. Thus the unconditional mean $E[X]=E[\Theta]=\frac{\alpha}{\beta}$.

Comment
Note that the unconditional distribution of $X$ is a negative binomial distribution. In a previous post (Compound negative binomial distribution), it was shown that if $N \sim Poisson(\Lambda)$ and $\Lambda \sim Gamma(\alpha,\beta)$, then the unconditional distribution of $X$ has the following probability function. We make use of this result in the Bayesian estimation problem in this post.

$\displaystyle P[N=n]=\frac{\Gamma(\alpha+n}{\Gamma(\alpha) \Gamma(n)} \biggl[\frac{\beta}{\beta+1}\biggr]^{\alpha} \biggl[\frac{1}{\beta+1}\biggr]^n$

The Marginal Distribution
$\displaystyle f_{X_1,X_2}(0,3)=\int_{0}^{\infty} e^{-\theta} \frac{\theta^3 e^{-\theta}}{3!} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta} d \theta$

$\displaystyle =\int_{0}^{\infty} \frac{\beta^{\alpha}}{3! \Gamma(\alpha)} \theta^{\alpha+3-1} e^{\beta+2} d \theta=\frac{\Gamma(\alpha+3)}{6 \Gamma(\alpha)} \frac{\beta^{\alpha}}{(\beta+2)^{\alpha+3}}$

The Posterior Distribution
$\displaystyle \pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3)=\frac{1}{f_{X_1,X_2}(0,3)} e^{-\theta} \frac{\theta^3 e^{-\theta}}{3!} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta}$

$\displaystyle =K \thinspace \theta^{\alpha+3-1} e^{-(\beta+2) \theta}$

In the above expression $K$ is a constant making $\pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3)$ a density function. Note that it has the form of a gamma distribution. Thus the posterior distribution must be:

$\displaystyle \pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3)=\frac{(\beta+2)^{\alpha+1}}{\Gamma(\alpha+3)} \thinspace \theta^{\alpha+3-1} e^{-(\beta+2) \theta}$

Thus the posterior distribution of $\Theta$ is a gamma distribution with parameter $\alpha+3$ and $\beta+2$.

The Predictive Distribution
Note that the predictive distribution is simply the mixture of $Poisson(\Theta)$ with $Gamma(\alpha+3,\beta+2)$ as mixing weights. By the comment above, the predictive distribution is a negative binomial distribution with the following probability function:

$\displaystyle f_{X_3 \lvert X_1,X_2}(x \lvert 0,3)=\frac{\Gamma(\alpha+5)}{\Gamma(\alpha+3) \Gamma(2)} \biggl[\frac{\beta+2}{\beta+3}\biggr]^{\alpha+3} \biggl[\frac{1}{\beta+3}\biggr]^{2}$

The Bayesian Predictive Mean
$\displaystyle E[X_3 \lvert 0,3]=\frac{\alpha+3}{\beta+2}=\frac{2}{\beta+2} \biggl(\frac{3}{2}\biggr)+\frac{\beta}{\beta+2} \biggl(\frac{\alpha}{\beta}\biggr) \ \ \ \ \ \ \ \ \ \ (1)$

Note that $E[X \lvert \Theta]=\Theta$. Thus the Bayesian predictive mean in this example is simply the mean of the posterior distribution of $\Theta$, which is $E[\Theta \vert 0,3]=\frac{\alpha+3}{\beta+2}$.

Comment
Generalizing the example, suppose that in the first $n$ periods, the claim counts for the insured are $X_1=x_1, \cdots, X_n=x_n$. Then the posterior distribution of the parameter $\Theta$ is a gamma distribution.

$\biggl[\Theta \lvert X_1=x_1, \cdots, X_n=x_n\biggr] \sim Gamma(\alpha+\sum_{i=1}^{n} x_i,\beta+n)$

Then the predictive distribution of $X_{n+1}$ given the observations has a negative binomial distribution. More importantly, the Bayesian predictive mean is:

$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]=\frac{\alpha+\sum_{i=1}^{n} x_i}{\beta+n}$

$\displaystyle =\frac{n}{\beta+n} \biggl(\frac{\sum \limits_{i=0}^{n} x_i}{n}\biggr)+\frac{\beta}{\beta+n} \biggl(\frac{\alpha}{\beta}\biggr)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

It is interesting that the Bayesian predictive mean of the $(n+1)^{th}$ period is a weighted average of the mean of the observed data ($\overline{X}$) and the unconditional mean $E[X]=\frac{\alpha}{\beta}$. Consequently, the above Bayesian estimate is a credibility estimate. The weight given to the observed data $Z=\frac{n}{\beta+n}$ is called the credibility factor. The estimate and the factor are called Bayesian credibility estimate and Bayesian credibility factor, respectively.

In general, the credibility estimate is an estimator of the following form:

$\displaystyle E=Z \thinspace \overline{X}+ (1-Z) \thinspace \mu_0$

where $\overline{X}$ is the mean of the observed data and $\mu_0$ is the mean based on other information. In our example here, $\mu_0$ is the unconditional mean. In practice, $\mu_0$ could be the mean based on the entire book of business, or a mean based on a different block of similar insurance policies. Another interpretation is that $\overline{X}$ is the mean of the recent experience data and $\mu_0$ is the mean of prior periods.

One more comment about the credibility factor $Z=\frac{n}{\beta+n}$ derived in this example. As $n \rightarrow \infty$, $Z \rightarrow 1$. This makes intuitive sense since this gives more weight to $\overline{X}$ as more and more data are accumulated.

# Examples of Bayesian prediction in insurance

We present two examples to illustrate the notion of Bayesian predictive distributions. The general insurance problem we aim to illustrate is that of using past claim experience data from an individual insured or a group of insureds to predict the future claim experience. Suppose we have $X_1,X_2, \cdots, X_n$ with each $X_i$ being the number of claims or an aggregate amount of claims in a prior period of observation. Given such results, what will be the number of claims during the next period, or what will be the aggregate claim amount in the next period? These two examples will motivate the notion of credibility, both Bayesian credibility theory and Buhlmann credibility theory. We present Example 1 in this post. Example 2 is presented in the next post (Examples of Bayesian prediction in insurance-continued).

Example 1
In this random experiment, there are a big bowl (called B) and two boxes (Box 1 and Box 2). Bowl B consists of a large quantity of balls, 80% of which are white and 20% of which are red. In Box 1, 60% of the balls are labeled 0, 30% are labeled 1 and 10% are labeled 2. In Box 2, 15% of the balls are labeled 0, 35% are labeled 1 and 50% are labeled 2. In the experiment, a ball is selected at random from bowl B. The color of the selected ball from bowl B determines which box to use (if the ball is white, then use Box 1, if red, use Box 2). Then balls are drawn at random from the selected box (Box $i$) repeatedly with replacement and the values of the series of selected balls are recorded. The value of first selected ball is $X_1$, the value of the second selected ball is $X_2$ and so on.

Suppose that your friend performs this random experiment (you do not know whether he uses Box 1 or Box 2) and that his first ball is a 1 ($X_1=1$) and his second ball is a 2 ($X_2=2$). What is the predicted value $X_3$ of the third selected ball?

Though it is straightforward to apply the Bayes’ theorem to this problem (the solution can be seen easily using a tree diagram) to obtain a numerical answer, we use this example to draw out the principle of Bayesian prediction. So it may appear that we are making a simple problem overly complicated. We are merely using this example to motivate the method of Bayesian estimation.

For convenience, we denote “draw a white ball from bowl B” by $\theta=1$ and “draw a red ball from bowl B” by $\theta=2$. Box 1 and Box 2 are conditional distributions. The Bowl B is a distribution for the parameter $\theta$. The distribution given in Bowl B is a probability distribution over the space of all parameter values (called a prior distribution). The prior distribution of $\theta$ and the conditional distributions of $X$ given $\theta$ are restated as follows:

$\pi_{\theta}(1)=0.8$
$\pi_{\theta}(2)=0.2$

$\displaystyle f_{X \lvert \Theta}(0 \lvert \theta=1)=0.60$
$\displaystyle f_{X \lvert \Theta}(1 \lvert \theta=1)=0.30$
$\displaystyle f_{X \lvert \Theta}(2 \lvert \theta=1)=0.10$

$\displaystyle f_{X \lvert \Theta}(0 \lvert \theta=2)=0.15$
$\displaystyle f_{X \lvert \Theta}(1 \lvert \theta=2)=0.35$
$\displaystyle f_{X \lvert \Theta}(2 \lvert \theta=2)=0.50$

The following shows the conditional means $E[X \lvert \theta]$ and the unconditional mean $E[X]$.

$\displaystyle E[X \lvert \theta=1]=0.6(0)+0.3(1)+0.1(2)=0.50$
$\displaystyle E[X \lvert \theta=2]=0.15(0)+0.35(1)+0.5(2)=1.35$
$\displaystyle E[X]=0.8(0.50)+0.2(1.35)=0.67$

If you know which particular box your friend is using ($\theta=1$ or $\theta=2$), then the estimate of the next ball should be $E[X \lvert \theta]$. But the value of $\theta$ is unkown to you. Another alternative for a predicted value is the unconditional mean $E[X]=0.67$. While the estimate $E[X]=0.67$ is easy to calculate, this estimate does not take the observed data ($X_1=1$ and $X_2=2$) into account and it certainly does not take the parameter $\theta$ into account. A third alternative is to incorporate the observed data into the estimate of the next ball. We now continue with the calculation of the Bayesian estimation.

Unconditional Distribution
$\displaystyle f_X(0)=0.6(0.8)+0.15(0.2)=0.51$
$\displaystyle f_X(1)=0.3(0.8)+0.35(0.2)=0.31$
$\displaystyle f_X(2)=0.1(0.8)+0.50(0.2)=0.18$

Marginal Probability
$\displaystyle f_{X_1,X_2}(1,2)=0.1(0.3)(0.8)+0.5(0.35)(0.2)=0.059$

Posterior Distribution of $\theta$
$\displaystyle \pi_{\Theta \lvert X_1,X_2}(1 \lvert 1,2)=\frac{0.1(0.3)(0.8)}{0.059}=\frac{24}{59}$

$\displaystyle \pi_{\Theta \lvert X_1,X_2}(2 \lvert 1,2)=\frac{0.5(0.35)(0.2)}{0.059}=\frac{35}{59}$

Predictive Distribution of $X$
$\displaystyle f_{X_3 \lvert X_1,X_2}(0 \lvert 1,2)=0.6 \frac{24}{59} + 0.15 \frac{35}{59}=\frac{19.65}{59}$

$\displaystyle f_{X_3 \lvert X_1,X_2}(1 \lvert 1,2)=0.3 \frac{24}{59} + 0.35 \frac{35}{59}=\frac{19.45}{59}$

$\displaystyle f_{X_3 \lvert X_1,X_2}(2 \lvert 1,2)=0.1 \frac{24}{59} + 0.50 \frac{35}{59}=\frac{19.90}{59}$

Here is another formulation of the predictive distribution of $X_3$. See the general methodology section below.
$\displaystyle f_{X_3 \lvert X_1,X_2}(0 \lvert 1,2)=\frac{0.6(0.1)(0.3)(0.8)+0.15(0.5)(0.35)(0.2)}{0.059}=\frac{19.65}{59}$

$\displaystyle f_{X_3 \lvert X_1,X_2}(1 \lvert 1,2)=\frac{0.3(0.1)(0.3)(0.8)+0.35(0.5)(0.35)(0.2)}{0.059}=\frac{19.45}{59}$

$\displaystyle f_{X_3 \lvert X_1,X_2}(2 \lvert 1,2)=\frac{0.1(0.1)(0.3)(0.8)+0.5(0.5)(0.35)(0.2)}{0.059}=\frac{19.90}{59}$

The posterior distribution $\pi_{\theta}(\cdot \lvert 1,2)$ is the conditional probability distribution of the parameter $\theta$ given the observed data $X_1=1$ and $X_2=2$. This is a result of applying the Bayes’ theorem. The predictive distribution $f_{X_3 \lvert X_1,X_2}(\cdot \lvert 1,2)$ is the conditional probability distribution of a new observation given the past observed data of $X_1=1$ and $X_2=2$. Since both of these distributions incorporate the past observations, the Bayesian estimate of the next observation is the mean of the predictive distribution.

$\displaystyle E[X_3 \lvert X_1=1,X_2=2]$

$\displaystyle =0 \thinspace f_{X_3 \lvert X_1,X_2}(0 \lvert 1,2)+1 \thinspace f_{X_3 \lvert X_1,X_2}(1 \lvert 1,2)+2 \thinspace f_{X_3 \lvert X_1,X_2}(2 \lvert 1,2)$

$\displaystyle =0 \frac{19.65}{59}+1 \frac{19.45}{59}+ 2 \frac{19.90}{59}$

$\displaystyle =\frac{59.25}{59}=1.0042372$

$\displaystyle E[X_3 \lvert X_1=1,X_2=2]$

$\displaystyle =E[X \lvert \theta=1] \medspace \pi_{\Theta \lvert X_1,X_2}(1 \lvert 1,2)+E[X \lvert \theta=2] \medspace \pi_{\Theta \lvert X_1,X_2}(2 \lvert 1,2)$

$\displaystyle =0.5 \frac{24}{59}+1.35 \frac{35}{59}=\frac{59.25}{59}$

Note that we compute the Bayesian estimate $E[X_3 \vert X_1,X_2]$ in two ways, one using the predictive distribution and the other using the posterior distribution of the parameter $\theta$. The Bayesian estimate is the mean of the hypothetical means $E[X \lvert \theta]$ with expectation taken over the entire posterior distribution $\pi_{\theta}(\cdot \lvert 1,2)$.

Discussion of General Methodology
We now use Example 1 to draw out general methodology. We first describe the discrete case and have the continuous case as a generalization.

Suppose we have a family of conditional density functions $f_{X \lvert \Theta}(x \lvert \theta)$. In Example 1, the bowl B is the distribution of the parameter $\theta$. Box 1 and Box 2 are the conditional distributions with density $f_{X \lvert \Theta}(x \lvert \theta)$. In an insurance application, the $\theta$ is a risk parameter and the conditional distribution $f_{X \lvert \Theta}(x \lvert \theta)$ is the claim experience in a given fixed period (conditional on $\Theta=\theta$).

Suppose that $X_1,X_2, \cdots, X_n,X_{n+1}$ (conditional on $\Theta=\theta$) are independent and identically distributed where the common density function is $f_{X \lvert \Theta}(x \lvert \theta)$. In our Example 1, once a box is selected (e.g. Box 1), then the repeated drawing of the balls are independent and identically distributed. In an insurance application, the $X_k$ are the claim experience from an insured (or a group of insureds) where the insured belongs to the risk class with parameter $\theta$.

We are interested in the conditional distribution of $X_{n+1}$ given $\Theta=\theta$ to predict $X_{n+1}$. In our example, $X_{n+1}$ is the value of the ball in the $(n+1)^{st}$ draw. In an insurance application, $X_{n+1}$ may be the claim experience of an insured (or a group of insureds) in the next policy period. We can use the unconditional mean $E[X]=E[E(X \lvert \Theta)]$ (the mean of the hypothetical means). This approach does not take the risk parameter of the insured into the equation. On the other hand, if we know the value of $\theta$, then we can use $f_{X \lvert \Theta}(x \lvert \theta)$. But the risk parameter is usually unknown. The natural alternative is to condition on the observed experience in the $n$ prior periods $X_1, \cdots, X_n$ rather than conditioning on the risk parameter $\theta$. Thus we derive the predictive distribution of $X_{n+1}$ given the observation $X_1, \cdots, X_n$. Given the observed experience data $X_1=x_1,X_2=x_2, \cdots, X_n=x_n$, the following is the derivation of the Bayesian predictive distribution. Note that the prior distribution of the parameter $\theta$ is $\pi_{\Theta}(\theta)$.

The Unconditional Distribution
$\displaystyle f_X(x)=\sum \limits_{\theta} f_{X \lvert \Theta}(x \lvert \theta) \ \pi_{\Theta}(\theta)$

The Marginal Distribution
$\displaystyle f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)=\sum \limits_{\theta} \biggl[\prod \limits_{i=1}^{n} f_{X_i \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)$

The Posterior Distribution
$\displaystyle \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)$

$\displaystyle = \ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \biggl[\prod \limits_{i=1}^{n} f_{X_i \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)$

The Predictive Distribution
$\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \sum \limits_{\theta} f_{X \lvert \Theta}(x \lvert \theta) \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)$

Another formulation is:
$\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \sum \limits_{\theta} f_{X_{n+1} \lvert \Theta}(x \lvert \theta) \biggl[ \prod \limits_{j=1}^{n}f_{X_j \lvert \Theta}(x_j \lvert \theta)\biggr] \thinspace \pi_{\Theta}(\theta)$

The Bayesian Predictive Mean of the Next Period
$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \sum \limits_{x} x \thinspace f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)$

$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \sum \limits_{\theta} E[X \lvert \theta] \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)$

We state the same results for the case that the claim experience $X$ is continuous.

The Unconditional Distribution
$\displaystyle f_{X}(x) = \int_{\theta} f_{X \lvert \Theta} (x \lvert \theta) \ \pi_{\Theta}(\theta) \ d \theta$

The Marginal Distribution
$\displaystyle f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)=\int \limits_{\theta} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta) d \theta$

The Posterior Distribution
$\displaystyle \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)$

The Predictive Distribution
$\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} f_{X \lvert \Theta}(x \lvert \theta) \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) \ d \theta$

Another formulation is:
$\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \int \limits_{\theta} f_{X_{n+1} \lvert \Theta}(x \lvert \theta) \biggl[ \prod \limits_{j=1}^{n}f_{X_j \lvert \Theta}(x_j \lvert \theta)\biggr] \thinspace \pi_{\Theta}(\theta) \ d \theta$

The Bayesian Predictive Mean of the Next Period
$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{x} x \thinspace f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n) dx$

$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} E[X \lvert \theta] \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) d \theta$

See the next post (Examples of Bayesian prediction in insurance-continued) for Example 2.

# Compound mixed Poisson distribution

Let the random sum $Y=X_1+X_2+ \cdots +Y_N$ be the aggregate claims generated in a fixed period by an independent group of insureds. When the number of claims $N$ follows a Poisson distribution, the sum $Y$ is said to have a compound Poisson distribution. When the number of claims $N$ has a mixed Poisson distribution, the sum $Y$ is said to have a compound mixed Poisson distribution. A mixed Poisson distribution is a Poisson random variable $N$ such that the Poisson parameter $\Lambda$ is uncertain. In other words, $N$ is a mixture of a family of Poisson distributions $N(\Lambda)$ and the random variable $\Lambda$ specifies the mixing weights. In this post, we present several basic properties of compound mixed Poisson distributions. In a previous post (Compound negative binomial distribution), we showed that the compound negative binomial distribution is an example of a compound mixed Poisson distribution (with gamma mixing weights).

In terms of notation, we have:

• $Y=X_1+X_2+ \cdots +Y_N$,
• $N \sim$ Poisson$(\Lambda)$,
• $\Lambda \sim$ some unspecified distribution.

The following presents basic proeprties of the compound mixed Poisson $Y$ in terms of the mixing weights $\Lambda$ and the claim amount random variable $X$.

Mean and Variance

$\displaystyle E[Y]=E[\Lambda] E[X]$

$\displaystyle Var[Y]=E[\Lambda] E[X^2]+Var[\Lambda] E[X]^2$

Moment Generating Function

$\displaystyle M_Y(t)=M_{\Lambda}[M_X(t)-1]$

Cumulant Generating Function

$\displaystyle \Psi_Y(t)=ln M_{\Lambda}[M_X(t)-1]=\Psi_{\Lambda}[M_X(t)-1]$

Measure of Skewness
$\displaystyle E[(Y-\mu_Y)^3]=\Psi_Y^{(3)}(0)$

$\displaystyle =\Psi_{\Lambda}^{(3)}(0) E[X]^3 + 3 \Psi_{\Lambda}^{(2)}(0) E[X] E[X^2]+\Psi_{\Lambda}^{(1)}(0) E[X^3]$

$\displaystyle =\gamma_{\Lambda} Var[\Lambda]^{\frac{3}{2}} E[X]^3 + 3 Var[\Lambda] E[X] E[X^2]+E[\Lambda] E[X^3]$

Measure of skewness: $\displaystyle \gamma_Y=\frac{E[(Y-\mu_Y)^3]}{(Var[Y])^{\frac{3}{2}}}$

Previous Posts on Compound Distributions

# Compound negative binomial distribution

In this post, we discuss the compound negative binomial distribution and its relationship with the compound Poisson distribution.

A compound distribution is a model for a random sum $Y=X_1+X_2+ \cdots +X_N$ where the number of terms $N$ is uncertain. To make the compund distribution more tractable, we assume that the variables $X_i$ are independent and identically distributed and that each $X_i$ is independent of $N$. The random sum $Y$ can be interpreted the sum of all the measurements that are associated with certain events that occur during a fixed period of time. For example, we may be interested in the total amount of rainfall in a 24-hour period, during which the occurences of a number of events are observed and each of the events provides a measurement of an amount of rainfall. Another interpretation of compound distribution is the random variable of the aggregate claims generated by an insurance policy or a group of insurance policies during a fixed policy period. In this setting, $N$ is the number of claims generated by the portfolio of insurance policies and $X_1$ is the amount of the first claim and $X_2$ is the amount of the second claim and so on. When $N$ follows the Poisson distribution, the random sum $Y$ is said to have a compound Poisson distribution. Even though the compound Poisson distribution has many attractive properties, it is not a good model when the variance of the number of claims is greater than the mean of the number of claims. In such situations, the compound negative binomial distribution may be a better fit. See this post (Compound Poisson distribution) for a basic discussion. See the links at the end of this post for more articles on compound distributons that I posted on this blog.

Compound Negative Binomial Distribution
The random variable $N$ is said to have a negative binomial distribution if its probability function is given by the following:

$\displaystyle P[N=n]=\binom{\alpha + n-1}{\alpha-1} \thinspace \biggl(\frac{\beta}{\beta+1}\biggr)^{\alpha}\biggl(\frac{1}{\beta+1}\biggr)^{n} \ \ \ \ \ \ \ \ \ \ \ \ (1)$

where $n=0,1,2,3, \cdots$, $\beta >0$ and $\alpha$ is a positive integer.

Our formulation of negative binomial distribution is the number of failures that occur before the $\alpha^{th}$ success in a sequence of independent Bernoulli trials. But this interpretation is not important to our task at hand. Let $Y=X_1+X_2+ \cdots +X_N$ be the random sum as described in the above introductory paragraph such that $N$ follows a negative binomial distribution. We present the basic properties discussed in the post An introduction to compound distributions by plugging the negative binomial distribution into $N$.

Distribution Function
$\displaystyle F_Y(y)=\sum \limits_{n=0}^{\infty} F^{*n}(y) \thinspace P[N=n]$

where $F$ is the common distribution function for $X_i$ and $F^{*n}$ is the $n^{th}$ convolution of $F$. Of course, $P[N=n]$ is the negative binomial probability function indicated above.

Mean and Variance
$\displaystyle E[Y]=E[N] \thinspace E[X]=\frac{\alpha}{\beta} E[X]$

$\displaystyle Var[Y]=E[N] \thinspace Var[X]+Var[N] \thinspace E[X]^2$

$\displaystyle =\frac{\alpha}{\beta} Var[X]+\frac{\alpha (\beta+1)}{\beta^2} E[X]^2$

Moment Generating Function
$\displaystyle M_Y(t)=M_N[ln M_X(t)]=\biggl(\frac{p}{1-(1-p) M_X(t)}\biggr)^{\alpha}$

$\displaystyle M_Y(t)=\biggl(\frac{\beta}{\beta+1- M_X(t)}\biggr)^{\alpha}$

where $\displaystyle p=\frac{\beta}{\beta+1}$, $\displaystyle M_N(t)=\biggl(\frac{p}{1-(1-p) e^{t}}\biggr)^{\alpha}$

Cumulant Generating Function
$\displaystyle \Psi_Y(t)=\alpha \thinspace ln \biggl(\frac{\beta}{\beta+1- M_X(t)}\biggr)$

Skewness
$\displaystyle E[(Y-\mu_Y)^3]=\Psi_Y^{(3)}(0)$

$\displaystyle =\frac{2}{\alpha^2} E[N]^3 E[X]^3 +\frac{3}{\alpha} E[N]^2 E[X] E[X^2]+E[N] E[X^3]$

Measure of skewness: $\displaystyle \gamma_Y=\frac{E[(Y-\mu_Y)^3]}{(Var[Y])^{\frac{3}{2}}}$

Compound Mixed Poisson Distribution
In a previous post (Basic properties of mixtures), we showed that the negative binomial distribution is a mixture of a family of Poisson distributions with gamma mixing weights. Specifically, if $N \sim \text{Poisson}(\Lambda)$ and $\Lambda \sim \text{Gamma}(\alpha,\beta)$, then the unconditional distribution of $N$ is a negative binomial distribution and the probability function is of the form (1) given above.

Thus the negative binomial distribution is a special example of a compound mixed Poisson distribution. When an aggregate claims variable $Y=X_1+X_2+ \cdots +Y_N$ has a compound mixed Poisson distribution, the number of claims $N$ follows a Poisson distribution, but the Poisson parameter $\Lambda$ is uncertain. The uncertainty could be due to an heterogeneity of risks across the insureds in the insurance portfolio (or across various rating classes). If the information of the risk parameter $\Lambda$ can be captured in a gamma distribution, then the unconditional number of claims in a given fixed period has a negative binomial distribution.

Previous Posts on Compound Distributions
An introduction to compound distributions
Some examples of compound distributions
Compound Poisson distribution
Compound Poisson distribution-discrete example