# Examples of Bayesian prediction in insurance-continued

This post is a continuation of the previous post Examples of Bayesian prediction in insurance. We present another example as an illustration of the methodology of Bayesian estimation. The example in this post, along with the example in the previous post, serve to motivate the concept of Bayeisan credibility and Buhlmann credibility theory. So these two posts are part of an introduction to credibility theory.

Suppose $X_1, \cdots, X_n,X_{n+1}$ are independent and identically distributed conditional on $\Theta=\theta$. We denote the density function of the common distribution of $X_j$ by $f_{X \lvert \Theta}(x \lvert \theta)$. We denote the prior distribution of the risk parameter $\Theta$ by $\pi_{\Theta}(\theta)$. The following shows the steps of the Bayesian estimate of the next observation $X_{n+1}$ given $X_1, \cdots, X_n$.

The Marginal Distribution
$\displaystyle f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)=\int \limits_{\theta} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta) d \theta$

The Posterior Distribution
$\displaystyle \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)$

The Predictive Distribution
$\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} \biggl(f_{X \lvert \Theta}(x \lvert \theta)\biggr) \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) d \theta$

The Bayesian Predictive Mean of the Next Period
$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{x} x \thinspace f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n) dx$

$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]$

$\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} E[X \lvert \theta] \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) d \theta$

Example 2
The number of claims $X$ generated by an insured in a potfolio of independent insurance policies has a Poisson distribution with parameter $\Theta$. In the portfolio of policies, the parameter $\Theta$ varies according to a gamma distribution with parameters $\alpha$ and $\beta$. We have the following conditional distributions of $X$ and prior distribution of $\theta$.

$\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\frac{\theta^x e^{-\theta}}{x!}$ where $x=0,1,2, \cdots$

$\displaystyle \pi_{\Theta}(\theta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta}$ where $\Gamma(\cdot)$ is the gamma function.

Suppose that a particular insured in this portfolio has generated 0 and 3 claims in the first 2 policy periods. What is the Bayesian estimate of the number of claims for this insured in period 3?

Note that the conditional mean $E[X \lvert \Theta]=\Theta$. Thus the unconditional mean $E[X]=E[\Theta]=\frac{\alpha}{\beta}$.

Comment
Note that the unconditional distribution of $X$ is a negative binomial distribution. In a previous post (Compound negative binomial distribution), it was shown that if $N \sim Poisson(\Lambda)$ and $\Lambda \sim Gamma(\alpha,\beta)$, then the unconditional distribution of $X$ has the following probability function. We make use of this result in the Bayesian estimation problem in this post.

$\displaystyle P[N=n]=\frac{\Gamma(\alpha+n}{\Gamma(\alpha) \Gamma(n)} \biggl[\frac{\beta}{\beta+1}\biggr]^{\alpha} \biggl[\frac{1}{\beta+1}\biggr]^n$

The Marginal Distribution
$\displaystyle f_{X_1,X_2}(0,3)=\int_{0}^{\infty} e^{-\theta} \frac{\theta^3 e^{-\theta}}{3!} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta} d \theta$

$\displaystyle =\int_{0}^{\infty} \frac{\beta^{\alpha}}{3! \Gamma(\alpha)} \theta^{\alpha+3-1} e^{\beta+2} d \theta=\frac{\Gamma(\alpha+3)}{6 \Gamma(\alpha)} \frac{\beta^{\alpha}}{(\beta+2)^{\alpha+3}}$

The Posterior Distribution
$\displaystyle \pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3)=\frac{1}{f_{X_1,X_2}(0,3)} e^{-\theta} \frac{\theta^3 e^{-\theta}}{3!} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta}$

$\displaystyle =K \thinspace \theta^{\alpha+3-1} e^{-(\beta+2) \theta}$

In the above expression $K$ is a constant making $\pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3)$ a density function. Note that it has the form of a gamma distribution. Thus the posterior distribution must be:

$\displaystyle \pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3)=\frac{(\beta+2)^{\alpha+1}}{\Gamma(\alpha+3)} \thinspace \theta^{\alpha+3-1} e^{-(\beta+2) \theta}$

Thus the posterior distribution of $\Theta$ is a gamma distribution with parameter $\alpha+3$ and $\beta+2$.

The Predictive Distribution
Note that the predictive distribution is simply the mixture of $Poisson(\Theta)$ with $Gamma(\alpha+3,\beta+2)$ as mixing weights. By the comment above, the predictive distribution is a negative binomial distribution with the following probability function:

$\displaystyle f_{X_3 \lvert X_1,X_2}(x \lvert 0,3)=\frac{\Gamma(\alpha+5)}{\Gamma(\alpha+3) \Gamma(2)} \biggl[\frac{\beta+2}{\beta+3}\biggr]^{\alpha+3} \biggl[\frac{1}{\beta+3}\biggr]^{2}$

The Bayesian Predictive Mean
$\displaystyle E[X_3 \lvert 0,3]=\frac{\alpha+3}{\beta+2}=\frac{2}{\beta+2} \biggl(\frac{3}{2}\biggr)+\frac{\beta}{\beta+2} \biggl(\frac{\alpha}{\beta}\biggr) \ \ \ \ \ \ \ \ \ \ (1)$

Note that $E[X \lvert \Theta]=\Theta$. Thus the Bayesian predictive mean in this example is simply the mean of the posterior distribution of $\Theta$, which is $E[\Theta \vert 0,3]=\frac{\alpha+3}{\beta+2}$.

Comment
Generalizing the example, suppose that in the first $n$ periods, the claim counts for the insured are $X_1=x_1, \cdots, X_n=x_n$. Then the posterior distribution of the parameter $\Theta$ is a gamma distribution.

$\biggl[\Theta \lvert X_1=x_1, \cdots, X_n=x_n\biggr] \sim Gamma(\alpha+\sum_{i=1}^{n} x_i,\beta+n)$

Then the predictive distribution of $X_{n+1}$ given the observations has a negative binomial distribution. More importantly, the Bayesian predictive mean is:

$\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]=\frac{\alpha+\sum_{i=1}^{n} x_i}{\beta+n}$

$\displaystyle =\frac{n}{\beta+n} \biggl(\frac{\sum \limits_{i=0}^{n} x_i}{n}\biggr)+\frac{\beta}{\beta+n} \biggl(\frac{\alpha}{\beta}\biggr)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

It is interesting that the Bayesian predictive mean of the $(n+1)^{th}$ period is a weighted average of the mean of the observed data ($\overline{X}$) and the unconditional mean $E[X]=\frac{\alpha}{\beta}$. Consequently, the above Bayesian estimate is a credibility estimate. The weight given to the observed data $Z=\frac{n}{\beta+n}$ is called the credibility factor. The estimate and the factor are called Bayesian credibility estimate and Bayesian credibility factor, respectively.

In general, the credibility estimate is an estimator of the following form:

$\displaystyle E=Z \thinspace \overline{X}+ (1-Z) \thinspace \mu_0$

where $\overline{X}$ is the mean of the observed data and $\mu_0$ is the mean based on other information. In our example here, $\mu_0$ is the unconditional mean. In practice, $\mu_0$ could be the mean based on the entire book of business, or a mean based on a different block of similar insurance policies. Another interpretation is that $\overline{X}$ is the mean of the recent experience data and $\mu_0$ is the mean of prior periods.

One more comment about the credibility factor $Z=\frac{n}{\beta+n}$ derived in this example. As $n \rightarrow \infty$, $Z \rightarrow 1$. This makes intuitive sense since this gives more weight to $\overline{X}$ as more and more data are accumulated.

# Compound mixed Poisson distribution

Let the random sum $Y=X_1+X_2+ \cdots +Y_N$ be the aggregate claims generated in a fixed period by an independent group of insureds. When the number of claims $N$ follows a Poisson distribution, the sum $Y$ is said to have a compound Poisson distribution. When the number of claims $N$ has a mixed Poisson distribution, the sum $Y$ is said to have a compound mixed Poisson distribution. A mixed Poisson distribution is a Poisson random variable $N$ such that the Poisson parameter $\Lambda$ is uncertain. In other words, $N$ is a mixture of a family of Poisson distributions $N(\Lambda)$ and the random variable $\Lambda$ specifies the mixing weights. In this post, we present several basic properties of compound mixed Poisson distributions. In a previous post (Compound negative binomial distribution), we showed that the compound negative binomial distribution is an example of a compound mixed Poisson distribution (with gamma mixing weights).

In terms of notation, we have:

• $Y=X_1+X_2+ \cdots +Y_N$,
• $N \sim$ Poisson$(\Lambda)$,
• $\Lambda \sim$ some unspecified distribution.

The following presents basic proeprties of the compound mixed Poisson $Y$ in terms of the mixing weights $\Lambda$ and the claim amount random variable $X$.

Mean and Variance

$\displaystyle E[Y]=E[\Lambda] E[X]$

$\displaystyle Var[Y]=E[\Lambda] E[X^2]+Var[\Lambda] E[X]^2$

Moment Generating Function

$\displaystyle M_Y(t)=M_{\Lambda}[M_X(t)-1]$

Cumulant Generating Function

$\displaystyle \Psi_Y(t)=ln M_{\Lambda}[M_X(t)-1]=\Psi_{\Lambda}[M_X(t)-1]$

Measure of Skewness
$\displaystyle E[(Y-\mu_Y)^3]=\Psi_Y^{(3)}(0)$

$\displaystyle =\Psi_{\Lambda}^{(3)}(0) E[X]^3 + 3 \Psi_{\Lambda}^{(2)}(0) E[X] E[X^2]+\Psi_{\Lambda}^{(1)}(0) E[X^3]$

$\displaystyle =\gamma_{\Lambda} Var[\Lambda]^{\frac{3}{2}} E[X]^3 + 3 Var[\Lambda] E[X] E[X^2]+E[\Lambda] E[X^3]$

Measure of skewness: $\displaystyle \gamma_Y=\frac{E[(Y-\mu_Y)^3]}{(Var[Y])^{\frac{3}{2}}}$

Previous Posts on Compound Distributions

# Compound negative binomial distribution

In this post, we discuss the compound negative binomial distribution and its relationship with the compound Poisson distribution.

A compound distribution is a model for a random sum $Y=X_1+X_2+ \cdots +X_N$ where the number of terms $N$ is uncertain. To make the compund distribution more tractable, we assume that the variables $X_i$ are independent and identically distributed and that each $X_i$ is independent of $N$. The random sum $Y$ can be interpreted the sum of all the measurements that are associated with certain events that occur during a fixed period of time. For example, we may be interested in the total amount of rainfall in a 24-hour period, during which the occurences of a number of events are observed and each of the events provides a measurement of an amount of rainfall. Another interpretation of compound distribution is the random variable of the aggregate claims generated by an insurance policy or a group of insurance policies during a fixed policy period. In this setting, $N$ is the number of claims generated by the portfolio of insurance policies and $X_1$ is the amount of the first claim and $X_2$ is the amount of the second claim and so on. When $N$ follows the Poisson distribution, the random sum $Y$ is said to have a compound Poisson distribution. Even though the compound Poisson distribution has many attractive properties, it is not a good model when the variance of the number of claims is greater than the mean of the number of claims. In such situations, the compound negative binomial distribution may be a better fit. See this post (Compound Poisson distribution) for a basic discussion. See the links at the end of this post for more articles on compound distributons that I posted on this blog.

Compound Negative Binomial Distribution
The random variable $N$ is said to have a negative binomial distribution if its probability function is given by the following:

$\displaystyle P[N=n]=\binom{\alpha + n-1}{\alpha-1} \thinspace \biggl(\frac{\beta}{\beta+1}\biggr)^{\alpha}\biggl(\frac{1}{\beta+1}\biggr)^{n} \ \ \ \ \ \ \ \ \ \ \ \ (1)$

where $n=0,1,2,3, \cdots$, $\beta >0$ and $\alpha$ is a positive integer.

Our formulation of negative binomial distribution is the number of failures that occur before the $\alpha^{th}$ success in a sequence of independent Bernoulli trials. But this interpretation is not important to our task at hand. Let $Y=X_1+X_2+ \cdots +X_N$ be the random sum as described in the above introductory paragraph such that $N$ follows a negative binomial distribution. We present the basic properties discussed in the post An introduction to compound distributions by plugging the negative binomial distribution into $N$.

Distribution Function
$\displaystyle F_Y(y)=\sum \limits_{n=0}^{\infty} F^{*n}(y) \thinspace P[N=n]$

where $F$ is the common distribution function for $X_i$ and $F^{*n}$ is the $n^{th}$ convolution of $F$. Of course, $P[N=n]$ is the negative binomial probability function indicated above.

Mean and Variance
$\displaystyle E[Y]=E[N] \thinspace E[X]=\frac{\alpha}{\beta} E[X]$

$\displaystyle Var[Y]=E[N] \thinspace Var[X]+Var[N] \thinspace E[X]^2$

$\displaystyle =\frac{\alpha}{\beta} Var[X]+\frac{\alpha (\beta+1)}{\beta^2} E[X]^2$

Moment Generating Function
$\displaystyle M_Y(t)=M_N[ln M_X(t)]=\biggl(\frac{p}{1-(1-p) M_X(t)}\biggr)^{\alpha}$

$\displaystyle M_Y(t)=\biggl(\frac{\beta}{\beta+1- M_X(t)}\biggr)^{\alpha}$

where $\displaystyle p=\frac{\beta}{\beta+1}$, $\displaystyle M_N(t)=\biggl(\frac{p}{1-(1-p) e^{t}}\biggr)^{\alpha}$

Cumulant Generating Function
$\displaystyle \Psi_Y(t)=\alpha \thinspace ln \biggl(\frac{\beta}{\beta+1- M_X(t)}\biggr)$

Skewness
$\displaystyle E[(Y-\mu_Y)^3]=\Psi_Y^{(3)}(0)$

$\displaystyle =\frac{2}{\alpha^2} E[N]^3 E[X]^3 +\frac{3}{\alpha} E[N]^2 E[X] E[X^2]+E[N] E[X^3]$

Measure of skewness: $\displaystyle \gamma_Y=\frac{E[(Y-\mu_Y)^3]}{(Var[Y])^{\frac{3}{2}}}$

Compound Mixed Poisson Distribution
In a previous post (Basic properties of mixtures), we showed that the negative binomial distribution is a mixture of a family of Poisson distributions with gamma mixing weights. Specifically, if $N \sim \text{Poisson}(\Lambda)$ and $\Lambda \sim \text{Gamma}(\alpha,\beta)$, then the unconditional distribution of $N$ is a negative binomial distribution and the probability function is of the form (1) given above.

Thus the negative binomial distribution is a special example of a compound mixed Poisson distribution. When an aggregate claims variable $Y=X_1+X_2+ \cdots +Y_N$ has a compound mixed Poisson distribution, the number of claims $N$ follows a Poisson distribution, but the Poisson parameter $\Lambda$ is uncertain. The uncertainty could be due to an heterogeneity of risks across the insureds in the insurance portfolio (or across various rating classes). If the information of the risk parameter $\Lambda$ can be captured in a gamma distribution, then the unconditional number of claims in a given fixed period has a negative binomial distribution.

Previous Posts on Compound Distributions
An introduction to compound distributions
Some examples of compound distributions
Compound Poisson distribution
Compound Poisson distribution-discrete example

# Basic properties of mixtures

In this post, we discuss some basic properties of mixtures (see these two previous posts for examples of mixtures – The law of total probability and Examples of mixtures). We also present the example of the negative binomial distribution. We show that the negative binomial distribution is a mixture of Poisson distributions with gamma mixing weights.

A random variable $X$ is a mixture if its distribution is a weighted sum (or integral) of a family of distribution functions $F_{X \lvert Y}$ where $Y$ is the mixing random variable. More specifically, $X$ is a mixture if its distribution function $F_X$ is of one of the following two forms:

$\displaystyle F_X(x)=\sum \limits_{y} F_{X \lvert Y=y}(x) P(Y=y)$

$\displaystyle F_X(x)=\int_{-\infty}^{+\infty} F_{X \lvert Y=y}(x) \thinspace f_Y(y) \thinspace dy$

In the first case, $X$ is a discrete mixture (i.e. the discrete random variable $Y$ provides the weights). In the second case, $X$ is a continuous mixture (i.e. the continuous random variable $Y$ provides the weights). In either case, the distribution of $Y$ is said to be the mixing variable (its distribution is the mixing distribution). In some probability and statistics texts, the notion of mixtures is called compounding.

Mixtures arise in many settings. The notion of mixtures is important in insurance applications (e.g. when the risk class of a policyholder is uncertain). The distribution for modeling the random loss for an insured risk (or a group of insured risks) is often a mixture distribution. Discrete mixture arises when the risk classifications is discrete. Continuous mixture is important for the situations where the random loss distribution has an uncertain risk parameter and the risk parameter follows a continuous distribution. See these two previous posts for examples of mixtures – The law of total probability and Examples of mixtures.

Unconditional Expectation
Let $X$ be a mixture and $Y$ be the mixing variable. Let $h:\mathbb{R} \rightarrow \mathbb{R}$ be a continuous function. We show the following fact about unconditional expectation. This formula is used below for establishing basic facts of mixtures.

$E[h(X)]=E_Y[\thinspace E(h(X) \lvert Y)] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (*)$

The following is the derivation:

$\displaystyle E_Y[\thinspace E(h(X) \lvert Y)]$

$\displaystyle=\int_{-\infty}^{+\infty}E[h(X) \lvert Y=y] \thinspace f_Y(y) \thinspace dy$

$\displaystyle=\int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} h(x) \thinspace f_{X \lvert Y}(x \lvert y) \thinspace dx \thinspace f_Y(y) \thinspace dy$

$\displaystyle=\int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} h(x) \thinspace f_{X \lvert Y}(x \lvert y) \thinspace f_Y(y) \thinspace dx \thinspace dy$

$\displaystyle=\int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} h(x) \thinspace f_{X,Y}(x,y) \thinspace dx \thinspace dy=E[h(X)]$

Basic Properties of a Mixture
Let $X$ be a mixture with mixing variable $Y$. The unconditional mean, variance and moment generating function of $X$ are:

(1) $E[X]=E_Y[E(X \lvert Y)]$

(2) $Var[X]=E_Y[Var(X \lvert Y)]+Var_Y[E(X \lvert Y)]$

(3) $M_X(t)=E_Y[M_{X \lvert Y}(t)]$

Discussion of (1)
The statement (1) is called the total expectation and follows from the unconditional expectation formula (*) above. Suppose $X$ is the random loss amount for an insured whose risk class is uncertain. The formula (1) states that the the average loss is the average of averages. The idea is to find the average loss for each risk class and then take the weighted average of these averages according to the distribution of the mixing variable (e.g. the distribution of the policyholders by risk class).

Discussion of (2)
The statement (2) is called the total variance. Sticking with the example of insured risks in a block of insurance policies, the total variance of the random loss for an insured comes from two sources – the average of the variation in each risk class plus the variation of the average loss in each risk class. As we will see in the example below and in subsequent posts, the uncertainty in a risk parameter in the distribution of $X$ (through the conditioning of $Y$) has the effect of increasing the variance in the unconditional random loss. The following derivations establish the formula for total variance.

$E_Y[Var(X \lvert Y)]$

$=E_Y \lbrace{E[X^2 \lvert Y]-E[X \lvert Y]^2}\rbrace$

$=E_Y[E(X^2 \lvert Y)]-E_Y[E(X \lvert Y)^2]$

$=E[X^2]-E_Y[E(X \lvert Y)^2]$

On the other hand, $Var_Y[E(X \lvert Y)]$

$=E_Y[E(X \lvert Y)^2]-E_Y[E(X \lvert Y)]^2$

$=E_Y[E(X \lvert Y)^2]-E[X]^2$

Discussion of (3)
This follows from the unconditional expectation formula (*) above. Note that $\displaystyle M_X(t)=E[e^{t \thinspace X}]=E_Y[E(e^{t \thinspace X} \lvert Y)]=E_Y[M_{X \lvert Y}(t)]$

Example
We show that the negative binomial distribution is a mixture of a family of Poisson distributions with gamma mixing weights. We then derive the mean, variance and moment generating function of the negative binomial distribution.

The following is the conditional Poisson probability function:

$\displaystyle P_{N \lvert \Lambda}(n \lvert \lambda)=\frac{\lambda^n \thinspace e^{-\lambda}}{n!}$ where $n=0,1,2,3,\cdots$

The Poisson parameter $\lambda$ is uncertain and it follows a gamma distribution with parameter $\alpha$ and $\beta$ with the following density function:

$\displaystyle f_{\Lambda}(\lambda)=\frac{\beta^\alpha}{\Gamma(\alpha)} \thinspace \lambda^{\alpha-1} \thinspace e^{-\beta \lambda}$

The unconditional probability function of $X$ is:

$\displaystyle P_N(n)=\int_0^{\infty} P_{N \lvert \Lambda}(n \lvert \lambda) \thinspace f_\Lambda(\lambda) \thinspace d \lambda$

$\displaystyle =\int_0^{\infty} \frac{\lambda^n \thinspace e^{-\lambda}}{n!} \thinspace \frac{\beta^\alpha}{\Gamma(\alpha)} \thinspace \lambda^{\alpha-1} \thinspace e^{-\beta \lambda} \thinspace d \lambda$

$\displaystyle =\frac{\beta^\alpha}{n! \Gamma(\alpha)} \int_0^{\infty} \lambda^{\alpha+n-1} \thinspace e^{-(\beta+1) \lambda} \thinspace d \lambda$

$\displaystyle =\frac{\beta^\alpha}{n! \Gamma(\alpha)} \thinspace \frac{\Gamma(\alpha+n)}{(\beta+1)^{\alpha+n}} \int_0^{\infty} \frac{(\beta+1)^{\alpha+n}}{\Gamma(\alpha+n)} \lambda^{\alpha+n-1} \thinspace e^{-(\beta+1) \lambda} \thinspace d \lambda$

$\displaystyle =\frac{\beta^\alpha}{n! \Gamma(\alpha)} \thinspace \frac{\Gamma(\alpha+n)}{(\beta+1)^{\alpha+n}}$

$\displaystyle =\frac{\Gamma(\alpha+n)}{\Gamma(\alpha) \Gamma(n+1)} \thinspace \biggl(\frac{\beta}{\beta+1}\biggr)^{\alpha} \thinspace \biggl(1-\frac{\beta}{\beta+1}\biggr)^n$

Note that the above unconditional probability function is that of a negative binomial distribution with parameters $\alpha$ and $p=\frac{\beta}{\beta+1}$. Note that $\displaystyle E[\Lambda]=\frac{\alpha}{\beta}$ and $\displaystyle Var[\Lambda]=\frac{\alpha}{\beta^2}$. We now compute the unconditional mean $E[N]$, the total variance $Var[N]$ and the moment generating function $M_N(t)$. Let $p=\frac{\beta}{\beta+1}$.

$\displaystyle E[N]=E[E(N \lvert \Lambda)]=E[\Lambda]=\frac{\alpha}{\beta}=\frac{\alpha \thinspace (1-p)}{p}$

$\displaystyle Var[N]=E[Var(N \lvert \Lambda)]+Var[E(N \lvert \Lambda)]$

$\displaystyle =E[\Lambda]+Var[\Lambda]=\frac{\alpha}{\beta}+\frac{\alpha}{\beta^2}=\frac{\alpha \thinspace (\beta+1)}{\beta^2}=\frac{\alpha \thinspace (1-p)}{p^2}$

To derive the moment generating function, note that the conditional Poisson mgf is $\displaystyle M_{N \lvert \Lambda}(t)=e^{\Lambda \thinspace (e^t - 1)}$.

$\displaystyle M_N(t)=\int_0^{\infty} M_{N \lvert \Lambda=\lambda}(t) \thinspace f_{\Lambda}(\lambda) \thinspace d \lambda$

$\displaystyle =\int_0^{\infty} e^{\lambda \thinspace (e^t - 1)} \thinspace \frac{\beta^\alpha}{\Gamma(\alpha)} \thinspace \lambda^{\alpha-1} \thinspace e^{-\beta \lambda} \thinspace d \lambda$

$\displaystyle =\frac{\beta^\alpha}{(\beta+1-e^t)^\alpha} \thinspace \int_0^{\infty} \frac{(\beta+1-e^t)^\alpha}{\Gamma(\alpha)} \thinspace \lambda^{\alpha-1} \thinspace e^{-(\beta+1-e^t) \thinspace \lambda} \thinspace d \lambda$

$\displaystyle =\frac{\beta^\alpha}{(\beta+1-e^t)^\alpha}=\biggl(\frac{\beta}{\beta+1-e^t}\biggr)^\alpha=\biggl(\frac{p}{1-(1-p) \thinspace e^t}\biggr)^\alpha$

Comment
In the above example, note that $E[N]. This stands in contrast with the Poisson distribution ($E[X]=Var[X]$) and with the binomial distribution ($Var[X]). In the above example, there is uncertainty in the risk parameter in the conditional Poisson distribution. The additional uncertainty causes the unconditional variance to increase.

# Examples of mixtures

The previous post introduced the notion of mixtures. A random variable $X$ is a mixture if its distribution function is a weighted average of a family of distribution functions. In this post, we present two more examples, one discrete and one continuous.

Comment
In the definition of mixtures, the distribution function is a weighted sum (or integral) of the conditional distribution functions. It is easy to verify that for a random variable that is a mixture, its probability function (the discrete case) or the probability density function (the continuous case) is also the weighted sum (or integral) of the conditional probability functions or conditional probability density functions. In the following examples, we derive the density functions rather than the distribution functions.

Example 1
Consider the following family of gamma density functions where the parameter $\alpha$ takes on nonnegative integers and the parameter $\lambda$ is known:

$\displaystyle f_{X \lvert Y}(x \lvert \alpha)=\frac{\lambda^{\alpha+1}}{\alpha!} x^{\alpha} e^{-\lambda x}$

The parameter $Y=\alpha$ follows a geometric distribution with the following probability function:

$P_Y(\alpha)=p \thinspace (1-p)^\alpha$ where $\alpha=0,1,2,3,...$ and $0.

Then $X$ has an exponential distribution with parameter $\lambda p$. That is, the unconditional density function of $X$ is $\displaystyle f_X(x)=\lambda p \thinspace e^{-\lambda p x}$.

Example 2
In a particular block of insurance policies of an auto insurer, the claim frequency for a policyholder during a given year is modeled using a binomial distribution with $n=2$ and $p=\lambda$. There is uncertainty in the true value of the risk parameter $p=\lambda$. The insurer uses a beta distribution to model the parameter $p=\lambda$ (i.e. $\lambda$ ranges from 0 to 1 according to a beta distribution). A new customer just purchased a policy, find the probability mass function for the number of claims in the next year.

Discussion of Example 1
Since the mixing weights come from a discrete distribution, we have:

$\displaystyle f_X(x)=\sum \limits_{\alpha=0}^{\infty} f_{X \lvert Y}(x \lvert \alpha) \thinspace P_Y(\alpha)$

After plugging in the appropriate components, we have the following:

$\displaystyle f_X(x)=\sum \limits_{\alpha=0}^{\infty} \frac{\lambda^{\alpha+1}}{\alpha!} \thinspace x^{\alpha} \thinspace e^{-\lambda x} \thinspace p \thinspace (1-p)^\alpha$

The above sum is simplified as:

$f_X(x)=\lambda p \thinspace e^{-\lambda p \thinspace x}$.

Discussion of Example 2
The following is the probability function of the conditional claim frequency is:

$\displaystyle P_{N \lvert \Lambda}(n \lvert \lambda)=\binom{2}{n} \lambda^n (1-\lambda)^{2-n}$ where $n=0,1,2$.

On the other hand, the parameter $\Lambda$ has the beta distribution with parameters $\alpha$ and $\beta$. The density function is as follows:

$\displaystyle f_{\Lambda}(\lambda)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \thinspace \lambda^{\alpha-1} \thinspace (1-\lambda)^{\beta-1}$ where $\Gamma(\cdot)$ is the gamma function.

The following is the unconditional probability function of $N$:

$\displaystyle P_N(n)=\int_0^1 P_{N \lvert \Lambda}(n \lvert \lambda) \thinspace f_{\Lambda}(\lambda) \thinspace d \lambda$ where $n=0,1,2$.

The following sets up $P_N(n)$ for each $n=0,1,2$.

$\displaystyle P_N(0) = \int_0^1 (1-\lambda)^2 \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \thinspace \lambda^{\alpha-1} \thinspace (1-\lambda)^{\beta-1} \thinspace d \lambda$

$\displaystyle P_N(1)=\int_0^1 2 \lambda (1-\lambda) \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \thinspace \lambda^{\alpha-1} \thinspace (1-\lambda)^{\beta-1} \thinspace d \lambda$

$\displaystyle P_N(2) = \int_0^1 \lambda^2 \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} \thinspace \lambda^{\alpha-1} \thinspace (1-\lambda)^{\beta-1} \thinspace d \lambda$

The following shows the results of the calculation.

$\displaystyle P_N(0)=\frac{\beta (\beta+1)}{(\alpha+\beta) (\alpha+\beta+1)}$

$\displaystyle P_N(1)=\frac{2 \alpha \beta}{(\alpha+\beta) (\alpha+\beta+1)}$

$\displaystyle P_N(2)=\frac{\alpha (\alpha+1)}{(\alpha+\beta) (\alpha+\beta+1)}$

# The law of total probability

The goal of this post is to present an example to illustrate the law of total probability and to use this example to motivate the definition of a random variable that is a mixture. To state the finite form of the law of total probability, let $E_1,E_2,...,E_n$ be events such that

• $E_i \cap E_j=\phi$ for $i \neq j$ and
• $P(E_1)+P(E_2)+...+P(E_n)=1$

Then for any event $E$, we have:

$(0) \ \ \ \ \ \ P(E)=P(E \cap E_1)+P(E \cap E_2)+ \cdots +P(E \cap E_n)$

According to the Bayes’ theorem, $\displaystyle P(E \lvert E_i)=\frac{P(E \cap E_i)}{P(E_i)}$. Thus the total law of probability can be stated as follows:

$(1) \ \ \ \ \ \ P(E)=P(E \lvert E_1)P(E_1)+P(E \lvert E_2)P(E_2)+ \cdots +P(E \lvert E_n)P(E_n)$

Example
In a random experiment, a box is chosen based on a coin toss. If the toss is head, Box 1 is chosen. If the toss is tail, Box 2 is chosen. Box 1 has 3 white balls and 1 red ball. Box 2 has 1 white ball and 4 red balls. The probability of the coin turning up head is 0.75. Once the box is chosen, two balls are drawn successively with replacement. Let $X$ be the number of red balls drawn. Find the probability function and the distribution function of the discrete random variable $X$.

Alternative Description of The Example
We can couch the same example as a risk classification problem in insurance. For example, an auto insurer has two groups of policyholders. On the basis of historical data, the insurer has determined that the claim frequency during a policy year for a policyholder classified as good risk follows a binomial distribution with $n=2$ and $p=\frac{1}{4}$. The claim frequency for a policyholder classified as bad risk follows a binomial distribution with $n=2$ and $p=\frac{4}{5}$. In this block of policies, 75% are classified as good risks and 25% are classified as bad risks. A new customer, whose risk class is not yet known with certainty, has just purchased a new policy. What distribution should be used to model the claim frequency for this new customer?

Discussion of Example
Let $I$ be the following indicator variable.

$\displaystyle I=\left\{\begin{matrix}1&\thinspace \text{probability=0.75}\\{2}&\thinspace \text{probability=0.25}\end{matrix}\right.$

By the law of total probability, for $i=0,1,2$, we have:

$P(X=i)=P(X=i \lvert I=1)P(I=1)+P(X=i \lvert I=2)P(I=2)$.

The following calculation derives the probability function and the distribution function.

$\displaystyle P(X=0)=\biggl(\frac{3}{4}\biggr)^2 \frac{3}{4}+\biggl(\frac{1}{5}\biggr)^2 \frac{1}{4}=\frac{2764}{6400}$

$\displaystyle P(X=1)=2\biggl(\frac{1}{4}\biggr)\biggl(\frac{3}{4}\biggr) \frac{3}{4}+2\biggl(\frac{4}{5}\biggr)\biggl(\frac{1}{5}\biggr) \frac{1}{4}=\frac{2312}{6400}$

$\displaystyle P(X=2)=\biggl(\frac{1}{4}\biggr)^2 \frac{3}{4}+\biggl(\frac{4}{5}\biggr)^2 \frac{1}{4}=\frac{1324}{6400}$

$\displaystyle P(X \leq 0)=\frac{2764}{6400}$

$\displaystyle P(X \leq 1)=\frac{5076}{6400}$

$\displaystyle P(X \leq 2)=\frac{6400}{6400}=1$

Comment
Note that in our example here, the unconditional probability function $P(X=i)$ is a weighted sum of two conditional probability functions (based on the conditioning on the indicator variable). It also follows from the total law of probability that the unconditional distribution function is also a weighted sum of the conditional distribution functions. For $x=0,1,2$, we have:

$P(X \le x)=P(X \le x \lvert I=1)P(I=1)+P(X \le x \lvert I=2)P(I=2)$

A random variable whose distribution function is the weighted sum of a family of distribution functions is called a mixture. The example is this post is called a discrete mixture since the set of weights is a discrete set. We end this post with the following definition.

Definition of Mixtures
Motivated by the example above, a random variable $X$ is said to be a mixture if its distribution function is of the form

$F_X(x)=\sum p_i F_{X_i}(x)$

for some sequence of random variables $X_1,X_2,X_3,...$ and some sequence of positive real numbers $p_1,p_2,p_3,...$ that sum to 1. The numbers $p_1,p_2,p_3,...$ are the mixing weights. In this case, $X$ is said to be a discrete mixture.

The concept of mixture (or mixing) does not need to be restricted to a countable number of distributions. We can mix a family of distributions indexed by the real numbers or some interval of real numbers and use a continuous probability density function as the mixing wieghts. A random variable is said to be a continuous mixture if

$\displaystyle F_X(x)=\int_{-\infty}^{+\infty}F_{X \lvert \Lambda=\lambda}(x) f_{\Lambda}(\lambda)d \lambda$

for some family of random variables $X \lvert \Lambda=\lambda$ and some density function $f_{\Lambda}$. Note that the family of random variables $X \lvert \Lambda=\lambda$ is indexed by the real numbers or some interval of real numbers $[a,b]$.

Mixtures arise in many settings. The notion of mixtures is important in insurance applications. For example, the claim frequency or amount of a random loss may have an uncertain risk parameter that varies from insured to insured. By mixing the conditional distribution of the claim frequency (or random loss amount) with the distribution of the uncertain risk parameter, we have a model that can describe the claim experience.

Discrete mixture arises when the risk parameter is discrete (e.g. the risk classifications is discrete). Continuous mixture is important for the situations where the risk parameter of the random loss distribution follows a continuous distribution. Examples of continuous mixtures and basic properties of mixtures will be discussed in subsequent posts (see Examples of mixtures, Basic properties of mixtures).