Examples of Bayesian prediction in insurance-continued

This post is a continuation of the previous post Examples of Bayesian prediction in insurance. We present another example as an illustration of the methodology of Bayesian estimation. The example in this post, along with the example in the previous post, serve to motivate the concept of Bayeisan credibility and Buhlmann credibility theory. So these two posts are part of an introduction to credibility theory.

Suppose X_1, \cdots, X_n,X_{n+1} are independent and identically distributed conditional on \Theta=\theta. We denote the density function of the common distribution of X_j by f_{X \lvert \Theta}(x \lvert \theta). We denote the prior distribution of the risk parameter \Theta by \pi_{\Theta}(\theta). The following shows the steps of the Bayesian estimate of the next observation X_{n+1} given X_1, \cdots, X_n.

The Marginal Distribution
\displaystyle f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)=\int \limits_{\theta} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta) d \theta

The Posterior Distribution
\displaystyle \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)

\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)

The Predictive Distribution
\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)

\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} \biggl(f_{X \lvert \Theta}(x \lvert \theta)\biggr) \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) d \theta

The Bayesian Predictive Mean of the Next Period
\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]

\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{x} x \thinspace f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n) dx

\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]

\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} E[X \lvert \theta] \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) d \theta

Example 2
The number of claims X generated by an insured in a potfolio of independent insurance policies has a Poisson distribution with parameter \Theta. In the portfolio of policies, the parameter \Theta varies according to a gamma distribution with parameters \alpha and \beta. We have the following conditional distributions of X and prior distribution of \theta.

\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\frac{\theta^x e^{-\theta}}{x!} where x=0,1,2, \cdots

\displaystyle \pi_{\Theta}(\theta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta} where \Gamma(\cdot) is the gamma function.

Suppose that a particular insured in this portfolio has generated 0 and 3 claims in the first 2 policy periods. What is the Bayesian estimate of the number of claims for this insured in period 3?

Note that the conditional mean E[X \lvert \Theta]=\Theta. Thus the unconditional mean E[X]=E[\Theta]=\frac{\alpha}{\beta}.

Comment
Note that the unconditional distribution of X is a negative binomial distribution. In a previous post (Compound negative binomial distribution), it was shown that if N \sim Poisson(\Lambda) and \Lambda \sim Gamma(\alpha,\beta), then the unconditional distribution of X has the following probability function. We make use of this result in the Bayesian estimation problem in this post.

\displaystyle P[N=n]=\frac{\Gamma(\alpha+n}{\Gamma(\alpha) \Gamma(n)} \biggl[\frac{\beta}{\beta+1}\biggr]^{\alpha} \biggl[\frac{1}{\beta+1}\biggr]^n

The Marginal Distribution
\displaystyle f_{X_1,X_2}(0,3)=\int_{0}^{\infty} e^{-\theta} \frac{\theta^3 e^{-\theta}}{3!} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta} d \theta

\displaystyle =\int_{0}^{\infty} \frac{\beta^{\alpha}}{3! \Gamma(\alpha)} \theta^{\alpha+3-1} e^{\beta+2} d \theta=\frac{\Gamma(\alpha+3)}{6 \Gamma(\alpha)} \frac{\beta^{\alpha}}{(\beta+2)^{\alpha+3}}

The Posterior Distribution
\displaystyle \pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3)=\frac{1}{f_{X_1,X_2}(0,3)} e^{-\theta} \frac{\theta^3 e^{-\theta}}{3!} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta}

\displaystyle =K \thinspace \theta^{\alpha+3-1} e^{-(\beta+2) \theta}

In the above expression K is a constant making \pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3) a density function. Note that it has the form of a gamma distribution. Thus the posterior distribution must be:

\displaystyle \pi_{\Theta \lvert X_1,X_2}(\theta \lvert 0,3)=\frac{(\beta+2)^{\alpha+1}}{\Gamma(\alpha+3)} \thinspace \theta^{\alpha+3-1} e^{-(\beta+2) \theta}

Thus the posterior distribution of \Theta is a gamma distribution with parameter \alpha+3 and \beta+2.

The Predictive Distribution
Note that the predictive distribution is simply the mixture of Poisson(\Theta) with Gamma(\alpha+3,\beta+2) as mixing weights. By the comment above, the predictive distribution is a negative binomial distribution with the following probability function:

\displaystyle f_{X_3 \lvert X_1,X_2}(x \lvert 0,3)=\frac{\Gamma(\alpha+5)}{\Gamma(\alpha+3) \Gamma(2)} \biggl[\frac{\beta+2}{\beta+3}\biggr]^{\alpha+3} \biggl[\frac{1}{\beta+3}\biggr]^{2}

The Bayesian Predictive Mean
\displaystyle E[X_3 \lvert 0,3]=\frac{\alpha+3}{\beta+2}=\frac{2}{\beta+2} \biggl(\frac{3}{2}\biggr)+\frac{\beta}{\beta+2} \biggl(\frac{\alpha}{\beta}\biggr) \ \ \ \ \ \ \ \ \ \ (1)

Note that E[X \lvert \Theta]=\Theta. Thus the Bayesian predictive mean in this example is simply the mean of the posterior distribution of \Theta, which is E[\Theta \vert 0,3]=\frac{\alpha+3}{\beta+2}.

Comment
Generalizing the example, suppose that in the first n periods, the claim counts for the insured are X_1=x_1, \cdots, X_n=x_n. Then the posterior distribution of the parameter \Theta is a gamma distribution.

\biggl[\Theta \lvert X_1=x_1, \cdots, X_n=x_n\biggr] \sim Gamma(\alpha+\sum_{i=1}^{n} x_i,\beta+n)

Then the predictive distribution of X_{n+1} given the observations has a negative binomial distribution. More importantly, the Bayesian predictive mean is:

\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]=\frac{\alpha+\sum_{i=1}^{n} x_i}{\beta+n}

\displaystyle =\frac{n}{\beta+n} \biggl(\frac{\sum \limits_{i=0}^{n} x_i}{n}\biggr)+\frac{\beta}{\beta+n} \biggl(\frac{\alpha}{\beta}\biggr)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

It is interesting that the Bayesian predictive mean of the (n+1)^{th} period is a weighted average of the mean of the observed data (\overline{X}) and the unconditional mean E[X]=\frac{\alpha}{\beta}. Consequently, the above Bayesian estimate is a credibility estimate. The weight given to the observed data Z=\frac{n}{\beta+n} is called the credibility factor. The estimate and the factor are called Bayesian credibility estimate and Bayesian credibility factor, respectively.

In general, the credibility estimate is an estimator of the following form:

\displaystyle E=Z \thinspace \overline{X}+ (1-Z) \thinspace \mu_0

where \overline{X} is the mean of the observed data and \mu_0 is the mean based on other information. In our example here, \mu_0 is the unconditional mean. In practice, \mu_0 could be the mean based on the entire book of business, or a mean based on a different block of similar insurance policies. Another interpretation is that \overline{X} is the mean of the recent experience data and \mu_0 is the mean of prior periods.

One more comment about the credibility factor Z=\frac{n}{\beta+n} derived in this example. As n \rightarrow \infty, Z \rightarrow 1. This makes intuitive sense since this gives more weight to \overline{X} as more and more data are accumulated.

Advertisements

Examples of Bayesian prediction in insurance

We present two examples to illustrate the notion of Bayesian predictive distributions. The general insurance problem we aim to illustrate is that of using past claim experience data from an individual insured or a group of insureds to predict the future claim experience. Suppose we have X_1,X_2, \cdots, X_n with each X_i being the number of claims or an aggregate amount of claims in a prior period of observation. Given such results, what will be the number of claims during the next period, or what will be the aggregate claim amount in the next period? These two examples will motivate the notion of credibility, both Bayesian credibility theory and Buhlmann credibility theory. We present Example 1 in this post. Example 2 is presented in the next post (Examples of Bayesian prediction in insurance-continued).

Example 1
In this random experiment, there are a big bowl (called B) and two boxes (Box 1 and Box 2). Bowl B consists of a large quantity of balls, 80% of which are white and 20% of which are red. In Box 1, 60% of the balls are labeled 0, 30% are labeled 1 and 10% are labeled 2. In Box 2, 15% of the balls are labeled 0, 35% are labeled 1 and 50% are labeled 2. In the experiment, a ball is selected at random from bowl B. The color of the selected ball from bowl B determines which box to use (if the ball is white, then use Box 1, if red, use Box 2). Then balls are drawn at random from the selected box (Box i) repeatedly with replacement and the values of the series of selected balls are recorded. The value of first selected ball is X_1, the value of the second selected ball is X_2 and so on.

Suppose that your friend performs this random experiment (you do not know whether he uses Box 1 or Box 2) and that his first ball is a 1 (X_1=1) and his second ball is a 2 (X_2=2). What is the predicted value X_3 of the third selected ball?

Though it is straightforward to apply the Bayes’ theorem to this problem (the solution can be seen easily using a tree diagram) to obtain a numerical answer, we use this example to draw out the principle of Bayesian prediction. So it may appear that we are making a simple problem overly complicated. We are merely using this example to motivate the method of Bayesian estimation.

For convenience, we denote “draw a white ball from bowl B” by \theta=1 and “draw a red ball from bowl B” by \theta=2. Box 1 and Box 2 are conditional distributions. The Bowl B is a distribution for the parameter \theta. The distribution given in Bowl B is a probability distribution over the space of all parameter values (called a prior distribution). The prior distribution of \theta and the conditional distributions of X given \theta are restated as follows:

\pi_{\theta}(1)=0.8
\pi_{\theta}(2)=0.2

\displaystyle f_{X \lvert \Theta}(0 \lvert \theta=1)=0.60
\displaystyle f_{X \lvert \Theta}(1 \lvert \theta=1)=0.30
\displaystyle f_{X \lvert \Theta}(2 \lvert \theta=1)=0.10

\displaystyle f_{X \lvert \Theta}(0 \lvert \theta=2)=0.15
\displaystyle f_{X \lvert \Theta}(1 \lvert \theta=2)=0.35
\displaystyle f_{X \lvert \Theta}(2 \lvert \theta=2)=0.50

The following shows the conditional means E[X \lvert \theta] and the unconditional mean E[X].

\displaystyle E[X \lvert \theta=1]=0.6(0)+0.3(1)+0.1(2)=0.50
\displaystyle E[X \lvert \theta=2]=0.15(0)+0.35(1)+0.5(2)=1.35
\displaystyle E[X]=0.8(0.50)+0.2(1.35)=0.67

If you know which particular box your friend is using (\theta=1 or \theta=2), then the estimate of the next ball should be E[X \lvert \theta]. But the value of \theta is unkown to you. Another alternative for a predicted value is the unconditional mean E[X]=0.67. While the estimate E[X]=0.67 is easy to calculate, this estimate does not take the observed data (X_1=1 and X_2=2) into account and it certainly does not take the parameter \theta into account. A third alternative is to incorporate the observed data into the estimate of the next ball. We now continue with the calculation of the Bayesian estimation.

Unconditional Distribution
\displaystyle f_X(0)=0.6(0.8)+0.15(0.2)=0.51
\displaystyle f_X(1)=0.3(0.8)+0.35(0.2)=0.31
\displaystyle f_X(2)=0.1(0.8)+0.50(0.2)=0.18

Marginal Probability
\displaystyle f_{X_1,X_2}(1,2)=0.1(0.3)(0.8)+0.5(0.35)(0.2)=0.059

Posterior Distribution of \theta
\displaystyle \pi_{\Theta \lvert X_1,X_2}(1 \lvert 1,2)=\frac{0.1(0.3)(0.8)}{0.059}=\frac{24}{59}

\displaystyle \pi_{\Theta \lvert X_1,X_2}(2 \lvert 1,2)=\frac{0.5(0.35)(0.2)}{0.059}=\frac{35}{59}

Predictive Distribution of X
\displaystyle f_{X_3 \lvert X_1,X_2}(0 \lvert 1,2)=0.6 \frac{24}{59} + 0.15 \frac{35}{59}=\frac{19.65}{59}

\displaystyle f_{X_3 \lvert X_1,X_2}(1 \lvert 1,2)=0.3 \frac{24}{59} + 0.35 \frac{35}{59}=\frac{19.45}{59}

\displaystyle f_{X_3 \lvert X_1,X_2}(2 \lvert 1,2)=0.1 \frac{24}{59} + 0.50 \frac{35}{59}=\frac{19.90}{59}

Here is another formulation of the predictive distribution of X_3. See the general methodology section below.
\displaystyle f_{X_3 \lvert X_1,X_2}(0 \lvert 1,2)=\frac{0.6(0.1)(0.3)(0.8)+0.15(0.5)(0.35)(0.2)}{0.059}=\frac{19.65}{59}

\displaystyle f_{X_3 \lvert X_1,X_2}(1 \lvert 1,2)=\frac{0.3(0.1)(0.3)(0.8)+0.35(0.5)(0.35)(0.2)}{0.059}=\frac{19.45}{59}

\displaystyle f_{X_3 \lvert X_1,X_2}(2 \lvert 1,2)=\frac{0.1(0.1)(0.3)(0.8)+0.5(0.5)(0.35)(0.2)}{0.059}=\frac{19.90}{59}

The posterior distribution \pi_{\theta}(\cdot \lvert 1,2) is the conditional probability distribution of the parameter \theta given the observed data X_1=1 and X_2=2. This is a result of applying the Bayes’ theorem. The predictive distribution f_{X_3 \lvert X_1,X_2}(\cdot \lvert 1,2) is the conditional probability distribution of a new observation given the past observed data of X_1=1 and X_2=2. Since both of these distributions incorporate the past observations, the Bayesian estimate of the next observation is the mean of the predictive distribution.

\displaystyle E[X_3 \lvert X_1=1,X_2=2]

\displaystyle =0 \thinspace f_{X_3 \lvert X_1,X_2}(0 \lvert 1,2)+1 \thinspace f_{X_3 \lvert X_1,X_2}(1 \lvert 1,2)+2 \thinspace f_{X_3 \lvert X_1,X_2}(2 \lvert 1,2)

\displaystyle =0 \frac{19.65}{59}+1 \frac{19.45}{59}+ 2 \frac{19.90}{59}

\displaystyle =\frac{59.25}{59}=1.0042372

\displaystyle E[X_3 \lvert X_1=1,X_2=2]

\displaystyle =E[X \lvert \theta=1] \medspace \pi_{\Theta \lvert X_1,X_2}(1 \lvert 1,2)+E[X \lvert \theta=2] \medspace \pi_{\Theta \lvert X_1,X_2}(2 \lvert 1,2)

\displaystyle =0.5 \frac{24}{59}+1.35 \frac{35}{59}=\frac{59.25}{59}

Note that we compute the Bayesian estimate E[X_3 \vert X_1,X_2] in two ways, one using the predictive distribution and the other using the posterior distribution of the parameter \theta. The Bayesian estimate is the mean of the hypothetical means E[X \lvert \theta] with expectation taken over the entire posterior distribution \pi_{\theta}(\cdot \lvert 1,2).

Discussion of General Methodology
We now use Example 1 to draw out general methodology. We first describe the discrete case and have the continuous case as a generalization.

Suppose we have a family of conditional density functions f_{X \lvert \Theta}(x \lvert \theta). In Example 1, the bowl B is the distribution of the parameter \theta. Box 1 and Box 2 are the conditional distributions with density f_{X \lvert \Theta}(x \lvert \theta). In an insurance application, the \theta is a risk parameter and the conditional distribution f_{X \lvert \Theta}(x \lvert \theta) is the claim experience in a given fixed period (conditional on \Theta=\theta).

Suppose that X_1,X_2, \cdots, X_n,X_{n+1} (conditional on \Theta=\theta) are independent and identically distributed where the common density function is f_{X \lvert \Theta}(x \lvert \theta). In our Example 1, once a box is selected (e.g. Box 1), then the repeated drawing of the balls are independent and identically distributed. In an insurance application, the X_k are the claim experience from an insured (or a group of insureds) where the insured belongs to the risk class with parameter \theta.

We are interested in the conditional distribution of X_{n+1} given \Theta=\theta to predict X_{n+1}. In our example, X_{n+1} is the value of the ball in the (n+1)^{st} draw. In an insurance application, X_{n+1} may be the claim experience of an insured (or a group of insureds) in the next policy period. We can use the unconditional mean E[X]=E[E(X \lvert \Theta)] (the mean of the hypothetical means). This approach does not take the risk parameter of the insured into the equation. On the other hand, if we know the value of \theta, then we can use f_{X \lvert \Theta}(x \lvert \theta). But the risk parameter is usually unknown. The natural alternative is to condition on the observed experience in the n prior periods X_1, \cdots, X_n rather than conditioning on the risk parameter \theta. Thus we derive the predictive distribution of X_{n+1} given the observation X_1, \cdots, X_n. Given the observed experience data X_1=x_1,X_2=x_2, \cdots, X_n=x_n, the following is the derivation of the Bayesian predictive distribution. Note that the prior distribution of the parameter \theta is \pi_{\Theta}(\theta).

The Unconditional Distribution
\displaystyle f_X(x)=\sum \limits_{\theta} f_{X \lvert \Theta}(x \lvert \theta) \ \pi_{\Theta}(\theta)

The Marginal Distribution
\displaystyle f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)=\sum \limits_{\theta} \biggl[\prod \limits_{i=1}^{n} f_{X_i \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)

The Posterior Distribution
\displaystyle \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)

\displaystyle = \ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \biggl[\prod \limits_{i=1}^{n} f_{X_i \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)

The Predictive Distribution
\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)

\displaystyle =\ \ \ \ \ \ \ \ \ \ \sum \limits_{\theta} f_{X \lvert \Theta}(x \lvert \theta) \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)

Another formulation is:
\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)

\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \sum \limits_{\theta} f_{X_{n+1} \lvert \Theta}(x \lvert \theta) \biggl[ \prod \limits_{j=1}^{n}f_{X_j \lvert \Theta}(x_j \lvert \theta)\biggr] \thinspace \pi_{\Theta}(\theta)

The Bayesian Predictive Mean of the Next Period
\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]

\displaystyle =\ \ \ \ \ \ \ \ \ \ \sum \limits_{x} x \thinspace f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)

\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]

\displaystyle =\ \ \ \ \ \ \ \ \ \ \sum \limits_{\theta} E[X \lvert \theta] \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)

We state the same results for the case that the claim experience X is continuous.

The Unconditional Distribution
\displaystyle f_{X}(x) = \int_{\theta} f_{X \lvert \Theta} (x \lvert \theta) \ \pi_{\Theta}(\theta) \ d \theta

The Marginal Distribution
\displaystyle f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)=\int \limits_{\theta} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta) d \theta

The Posterior Distribution
\displaystyle \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n)

\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \biggl[\prod \limits_{i=1}^{n} f_{X \lvert \Theta}(x_i \lvert \theta)\biggr] \pi_{\Theta}(\theta)

The Predictive Distribution
\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)

\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} f_{X \lvert \Theta}(x \lvert \theta) \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) \ d \theta

Another formulation is:
\displaystyle f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n)

\displaystyle =\ \ \ \ \ \ \ \ \ \ \frac{1}{f_{X_1, \cdots, X_n}(x_1, \cdots, x_n)} \int \limits_{\theta} f_{X_{n+1} \lvert \Theta}(x \lvert \theta) \biggl[ \prod \limits_{j=1}^{n}f_{X_j \lvert \Theta}(x_j \lvert \theta)\biggr] \thinspace \pi_{\Theta}(\theta) \ d \theta

The Bayesian Predictive Mean of the Next Period
\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]

\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{x} x \thinspace f_{X_{n+1} \lvert X_1, \cdots, X_n}(x \vert x_1, \cdots, x_n) dx

\displaystyle E[X_{n+1} \lvert X_1=x_1, \cdots, X_n=x_n]

\displaystyle =\ \ \ \ \ \ \ \ \ \ \int \limits_{\theta} E[X \lvert \theta] \thinspace \pi_{\Theta \lvert X_1, \cdots, X_n}(\theta \lvert x_1, \cdots, x_n) d \theta

See the next post (Examples of Bayesian prediction in insurance-continued) for Example 2.

Compound mixed Poisson distribution

Let the random sum Y=X_1+X_2+ \cdots +Y_N be the aggregate claims generated in a fixed period by an independent group of insureds. When the number of claims N follows a Poisson distribution, the sum Y is said to have a compound Poisson distribution. When the number of claims N has a mixed Poisson distribution, the sum Y is said to have a compound mixed Poisson distribution. A mixed Poisson distribution is a Poisson random variable N such that the Poisson parameter \Lambda is uncertain. In other words, N is a mixture of a family of Poisson distributions N(\Lambda) and the random variable \Lambda specifies the mixing weights. In this post, we present several basic properties of compound mixed Poisson distributions. In a previous post (Compound negative binomial distribution), we showed that the compound negative binomial distribution is an example of a compound mixed Poisson distribution (with gamma mixing weights).

In terms of notation, we have:

  • Y=X_1+X_2+ \cdots +Y_N,
  • N \sim Poisson(\Lambda),
  • \Lambda \sim some unspecified distribution.

The following presents basic proeprties of the compound mixed Poisson Y in terms of the mixing weights \Lambda and the claim amount random variable X.

Mean and Variance

\displaystyle E[Y]=E[\Lambda] E[X]

\displaystyle Var[Y]=E[\Lambda] E[X^2]+Var[\Lambda] E[X]^2

Moment Generating Function

\displaystyle M_Y(t)=M_{\Lambda}[M_X(t)-1]

Cumulant Generating Function

\displaystyle \Psi_Y(t)=ln M_{\Lambda}[M_X(t)-1]=\Psi_{\Lambda}[M_X(t)-1]

Measure of Skewness
\displaystyle E[(Y-\mu_Y)^3]=\Psi_Y^{(3)}(0)

\displaystyle =\Psi_{\Lambda}^{(3)}(0) E[X]^3 + 3 \Psi_{\Lambda}^{(2)}(0) E[X] E[X^2]+\Psi_{\Lambda}^{(1)}(0) E[X^3]

\displaystyle =\gamma_{\Lambda} Var[\Lambda]^{\frac{3}{2}} E[X]^3 + 3 Var[\Lambda] E[X] E[X^2]+E[\Lambda] E[X^3]

Measure of skewness: \displaystyle \gamma_Y=\frac{E[(Y-\mu_Y)^3]}{(Var[Y])^{\frac{3}{2}}}

Previous Posts on Compound Distributions

An introduction to compound distributions
Some examples of compound distributions
Compound Poisson distribution
Compound Poisson distribution-discrete example
Compound negative binomial distribution

Compound negative binomial distribution

In this post, we discuss the compound negative binomial distribution and its relationship with the compound Poisson distribution.

A compound distribution is a model for a random sum Y=X_1+X_2+ \cdots +X_N where the number of terms N is uncertain. To make the compund distribution more tractable, we assume that the variables X_i are independent and identically distributed and that each X_i is independent of N. The random sum Y can be interpreted the sum of all the measurements that are associated with certain events that occur during a fixed period of time. For example, we may be interested in the total amount of rainfall in a 24-hour period, during which the occurences of a number of events are observed and each of the events provides a measurement of an amount of rainfall. Another interpretation of compound distribution is the random variable of the aggregate claims generated by an insurance policy or a group of insurance policies during a fixed policy period. In this setting, N is the number of claims generated by the portfolio of insurance policies and X_1 is the amount of the first claim and X_2 is the amount of the second claim and so on. When N follows the Poisson distribution, the random sum Y is said to have a compound Poisson distribution. Even though the compound Poisson distribution has many attractive properties, it is not a good model when the variance of the number of claims is greater than the mean of the number of claims. In such situations, the compound negative binomial distribution may be a better fit. See this post (Compound Poisson distribution) for a basic discussion. See the links at the end of this post for more articles on compound distributons that I posted on this blog.

Compound Negative Binomial Distribution
The random variable N is said to have a negative binomial distribution if its probability function is given by the following:

\displaystyle P[N=n]=\binom{\alpha + n-1}{\alpha-1} \thinspace \biggl(\frac{\beta}{\beta+1}\biggr)^{\alpha}\biggl(\frac{1}{\beta+1}\biggr)^{n} \ \ \ \ \ \ \ \ \ \ \ \ (1)

where n=0,1,2,3, \cdots, \beta >0 and \alpha is a positive integer.

Our formulation of negative binomial distribution is the number of failures that occur before the \alpha^{th} success in a sequence of independent Bernoulli trials. But this interpretation is not important to our task at hand. Let Y=X_1+X_2+ \cdots +X_N be the random sum as described in the above introductory paragraph such that N follows a negative binomial distribution. We present the basic properties discussed in the post An introduction to compound distributions by plugging the negative binomial distribution into N.

Distribution Function
\displaystyle F_Y(y)=\sum \limits_{n=0}^{\infty} F^{*n}(y) \thinspace P[N=n]

where F is the common distribution function for X_i and F^{*n} is the n^{th} convolution of F. Of course, P[N=n] is the negative binomial probability function indicated above.

Mean and Variance
\displaystyle E[Y]=E[N] \thinspace E[X]=\frac{\alpha}{\beta} E[X]

\displaystyle Var[Y]=E[N] \thinspace Var[X]+Var[N] \thinspace E[X]^2

\displaystyle =\frac{\alpha}{\beta} Var[X]+\frac{\alpha (\beta+1)}{\beta^2} E[X]^2

Moment Generating Function
\displaystyle M_Y(t)=M_N[ln M_X(t)]=\biggl(\frac{p}{1-(1-p) M_X(t)}\biggr)^{\alpha}

\displaystyle M_Y(t)=\biggl(\frac{\beta}{\beta+1- M_X(t)}\biggr)^{\alpha}

where \displaystyle p=\frac{\beta}{\beta+1}, \displaystyle M_N(t)=\biggl(\frac{p}{1-(1-p) e^{t}}\biggr)^{\alpha}

Cumulant Generating Function
\displaystyle \Psi_Y(t)=\alpha \thinspace ln \biggl(\frac{\beta}{\beta+1- M_X(t)}\biggr)

Skewness
\displaystyle E[(Y-\mu_Y)^3]=\Psi_Y^{(3)}(0)

\displaystyle =\frac{2}{\alpha^2} E[N]^3 E[X]^3 +\frac{3}{\alpha} E[N]^2 E[X] E[X^2]+E[N] E[X^3]

Measure of skewness: \displaystyle \gamma_Y=\frac{E[(Y-\mu_Y)^3]}{(Var[Y])^{\frac{3}{2}}}

Compound Mixed Poisson Distribution
In a previous post (Basic properties of mixtures), we showed that the negative binomial distribution is a mixture of a family of Poisson distributions with gamma mixing weights. Specifically, if N \sim \text{Poisson}(\Lambda) and \Lambda \sim \text{Gamma}(\alpha,\beta), then the unconditional distribution of N is a negative binomial distribution and the probability function is of the form (1) given above.

Thus the negative binomial distribution is a special example of a compound mixed Poisson distribution. When an aggregate claims variable Y=X_1+X_2+ \cdots +Y_N has a compound mixed Poisson distribution, the number of claims N follows a Poisson distribution, but the Poisson parameter \Lambda is uncertain. The uncertainty could be due to an heterogeneity of risks across the insureds in the insurance portfolio (or across various rating classes). If the information of the risk parameter \Lambda can be captured in a gamma distribution, then the unconditional number of claims in a given fixed period has a negative binomial distribution.

Previous Posts on Compound Distributions
An introduction to compound distributions
Some examples of compound distributions
Compound Poisson distribution
Compound Poisson distribution-discrete example

Compound Poisson distribution-discrete example

We present a discrete example of a compound Poisson distribution. A random variable Y has a compound distribution if Y=X_1+ \cdots +X_N where the number of terms N is a discrete random variable whose support is the set of all nonnegative integers (or some appropriate subset) and the random variables X_i are identically distributed (let X be the common distribution). We further assume that the random variables X_i are independent and each X_i is independent of N. When N follows the Poisson distribution, Y is said to have a compound Poisson distribution. When the common distribution for the X_i is continuous, Y is a mixed distribution if P[N=0] is nonzero. When the common distribution for the X_i is discrete, Y is a discrete distribution. In this post we present an example of a compound Poisson distribution where the common distribution X is discrete. The compound distribution has a natural insurance interpretation (see the following links).

Compound Poisson distribution
Some examples of compound distributions
An introduction to compound distributions

General Discussion
In general, the distribution function of a compound Poisson random variable Y is the weighted average of all the n^{th} convolutions of the common distribution function of the individual claim amount X. The following shows the form of such a distribution function:

\displaystyle F_Y(y)=\sum \limits_{n=0}^{\infty} F^{*n}(y) P[N=n]

where \displaystyle F is the common distribution of the X_n and F^{*n} is the n^{th} convolution of F.

If the distribution of the individual claim X is discrete, we can obtain the probability mass function of Y by convolutions as follows:

\displaystyle f_Y(y)=P[Y=y]=\sum \limits_{n=0}^{\infty} p^{*n}(y) P[N=n]

where \displaystyle p^{*1}(y)=P[X=y]
and \displaystyle p^{*n}=p^* \cdots p^{*}(x)=P[X_1+X_2+ \cdots +X_n=y]
and \displaystyle p^{*0}(y)=\left\{\begin{matrix}0&\thinspace y \ne 0\\{1}&\thinspace x=0\end{matrix}\right.

Example
Suppose the number of claims generated by a portfolio of insurance policies over a fixed time period has a Poisson distribution with parameter \lambda. Individual claim amounts will be 1 or 2 with probabilities 0.6 and 0.4, respectively. For the compound Poisson aggregate claims Y=X_1+ \cdots +X_N, find P[Y=k] for k=0,1,2,3,4.

The probability mass function of N is: \displaystyle f_N(n)=\frac{\lambda^n e^{-\lambda}}{n!} where n=0,1,2, \cdots. The individual claim amounnt X has a Bernoulli distribution since it is a two-valued discrete random variable. For convenience, we let p=0.4 (i.e. we consider X=2 is a success). Then the sum X_1+ \cdots + X_n has a Binomial distribution. Consequently, the n^{th} convolution p^{*n} is simply the distribution function of Binomial(n,p). The following shows p^{*n} for n=1,2,3,4.

\displaystyle p^{*1}(1)=0.6, \thinspace p^{*1}(2)=0.4

\displaystyle p^{*2}(2)=\binom{2}{0} (0.4)^0 (0.6)^2=0.36
\displaystyle p^{*2}(3)=\binom{2}{1} (0.4)^1 (0.6)^1=0.48
\displaystyle p^{*2}(4)=\binom{2}{2} (0.4)^2 (0.6)^0=0.16

\displaystyle p^{*3}(3)=\binom{3}{0} (0.4)^0 (0.6)^3=0.216
\displaystyle p^{*3}(4)=\binom{3}{1} (0.4)^1 (0.6)^2=0.432
\displaystyle p^{*3}(5)=\binom{3}{2} (0.4)^2 (0.6)^1=0.288
\displaystyle p^{*3}(6)=\binom{3}{3} (0.4)^3 (0.6)^0=0.064

\displaystyle p^{*4}(4)=\binom{4}{0} (0.4)^0 (0.6)^4=0.1296
\displaystyle p^{*4}(5)=\binom{4}{1} (0.4)^1 (0.6)^3=0.3456
\displaystyle p^{*4}(6)=\binom{4}{2} (0.4)^2 (0.6)^2=0.3456
\displaystyle p^{*4}(7)=\binom{4}{3} (0.4)^3 (0.6)^1=0.1536
\displaystyle p^{*4}(8)=\binom{4}{4} (0.4)^4 (0.6)^0=0.0256

Since we are interested in finding P[Y=y] for y=0,1,2,3,4, we only need to consider N=0,1,2,3,4. The following matrix shows the relevant values of p^{*n}. The rows are for y=0,1,2,3,4. The columns are p^{*0}, p^{*1}, p^{*2}, p^{*3}, p^{*4}.

\displaystyle \begin{pmatrix} 1&0&0&0&0 \\{0}&0.6&0&0&0 \\{0}&0.4&0.36&0&0 \\{0}&0&0.48&0.216&0 \\{0}&0&0.16&0.432&0.1296\end{pmatrix}

To obtain the probability mass function of Y, we simply multiply each row by P[N=n] where n=0,1,2,3,4.

\displaystyle P[Y=0]=e^{-\lambda}
\displaystyle P[Y=1]=0.6 \lambda e^{-\lambda}
\displaystyle P[Y=2]=0.4 \lambda e^{-\lambda}+0.36 \frac{\lambda^2 e^{-\lambda}}{2}
\displaystyle P[Y=3]=0.48 \frac{\lambda^2 e^{-\lambda}}{2}+0.216 \frac{\lambda^3 e^{-\lambda}}{6}
\displaystyle P[Y=4]=0.16 \frac{\lambda^2 e^{-\lambda}}{2}+0.432 \frac{\lambda^3 e^{-\lambda}}{6}+0.1296 \frac{\lambda^4 e^{-\lambda}}{24}

Compound Poisson distribution

The compound distribution is a model for describing the aggregate claims arised in a group of independent insureds. Let N be the number of claims generated by a portfolio of insurance policies in a fixed time period. Suppose X_1 is the amount of the first claim, X_2 is the amount of the second claim and so on. Then Y=X_1+X_2+ \cdots + X_N represents the total aggregate claims generated by this portfolio of policies in the given fixed time period. In order to make this model more tractable, we make the following assumptions:

  • X_1,X_2, \cdots are independent and identically distributed.
  • Each X_i is independent of the number of claims N.

The number of claims N is associated with the claim frequency in the given portfolio of policies. The common distribution of X_1,X_2, \cdots is denoted by X. Note that X models the amount of a random claim generated in this portfolio of insurance policies. See these two posts for an introduction to compound distributions (An introduction to compound distributions, Some examples of compound distributions).

When the claim frequency N follows a Poisson distribution with a constant parameter \lambda, the aggreagte claims Y is said to have a compound Poisson distribution. After a general discussion of the compound Poisson distribution, we discuss the property that an independent sum of compound Poisson distributions is also a compound Poisson distribution. We also present an example to illustrate basic calculations.

Compound Poisson – General Properties

Distribution Function
\displaystyle F_Y(y)=\sum \limits_{n=0}^{\infty} F^{*n}(y) \frac{\lambda^n e^{-\lambda}}{n!}

where \lambda=E[N], F is the common distribution function of X_i and F^{*n} is the n-fold convolution of F.

Mean and Variance
\displaystyle E[Y]=E[N] E[X]= \lambda E[X]

\displaystyle Var[Y]=\lambda E[X^2]

Moment Generating Function and Cumulant Generating Function
\displaystyle M_Y(t)=e^{\lambda (M_X(t)-1)}

\displaystyle \Psi_Y(t)=ln M_Y(t)=\lambda (M_X(t)-1)

Note that the moment generating function of the Poisson N is M_N(t)=e^{\lambda (e^t - 1)}. For a compound distribution Y in general, M_Y(t)=M_N[ln M_X(t)].

Skewness
\displaystyle E[(Y-\mu_Y)^3]=\Psi_Y^{(3)}(0)=\lambda E[X^3]

\displaystyle \gamma_Y=\frac{E[(Y-\mu_Y)^3]}{Var[Y]^{\frac{3}{2}}}=\frac{1}{\sqrt{\lambda}} \frac{E[X^3]}{E[X^2]^{\frac{3}{2}}}

Independent Sum of Compound Poisson Distributions
First, we state the results. Suppose that Y_1,Y_2, \cdots, Y_k are independent random variables such that each Y_i has a compound Poisson distribution with \lambda_i being the Poisson parameter for the number of claim variable and F_i being the distribution function for the individual claim amount. Then Y=Y_1+Y_2+ \cdots +Y_k has a compound Poisson distribution with:

  • the Poisson parameter: \displaystyle \lambda=\sum \limits_{i=1}^{k} \lambda_i
  • the distribution function: \displaystyle F_Y(y)=\sum \limits_{i=1}^{k} \frac{\lambda_i}{\lambda} \thinspace F_i(y)

The above result has an insurance interpretation. Suppose we have k independent blocks of insurance policies such that the aggregate claims Y_i for the i^{th} block has a compound Poisson distribution. Then Y=Y_1+Y_2+ \cdots +Y_k is the aggregate claims for the combined block during the fixed policy period and also has a compound Poisson distribution with the parameters stated in the above two bullet points.

To get a further intuitive understanding about the parameters of the combined block, consider N_i as the Poisson number of claims in the i^{th} block of insurance policies. It is a well known fact in probability theory (see [1]) that the indpendent sum of Poisson variables is also a Poisson random variable. Thus the total number of claims in the combined block is N=N_1+N_2+ \cdots +N_k and has a Poisson distribution with parameter \lambda=\lambda_1 + \cdots + \lambda_k.

How do we describe the distribution of an individual claim amount in the combined insurance block? Given a claim from the combined block, since we do not know which of the constituent blocks it is from, this suggests that an individual claim amount is a mixture of the individual claim amount distributions from the k blocks with mixing weights \displaystyle \frac{\lambda_1}{\lambda},\frac{\lambda_2}{\lambda}, \cdots, \frac{\lambda_k}{\lambda}. These mixing weights make intuitive sense. If insurance bock i has a higher claim frequency \lambda_i, then it is more likely that a randomly selected claim from the combined block comes from block i. Of course, this discussion is not a proof. But looking at the insurance model is a helpful way of understanding the independent sum of compound Poisson distributions.

To see why the stated result is true, let M_i(t) be the moment generating function of the individual claim amount in the i^{th} block of policies. Then the mgf of the aggregate claims Y_i is \displaystyle M_{Y_i}(t)=e^{\lambda_i (M_i(t)-1)}. Consequently, the mgf of the independent sum Y=Y_1+ \cdots + Y_k is:

\displaystyle M_Y(t)=\prod \limits_{i=0}^{k} e^{\lambda_i (M_i(t)-1)}= e^{\sum \limits_{i=0}^{k} \lambda_i(M_i(t)-1)} \displaystyle = e^{\lambda \biggl[\sum \limits_{i=0}^{k} \frac{\lambda_i}{\lambda} M_i(t) - 1 \biggr]}

The mgf of Y has the form of a compound Poisson distribution where the Poisson parameter is \lambda=\lambda_1 + \cdots + \lambda_k. Note that the component \displaystyle \sum \limits_{i=0}^{k} \frac{\lambda_i}{\lambda}M_i(t) in the exponent is the mgf of the claim amount distribution. Since it is the weighted average of the individual claim amount mgf’s, this indicates that the distribution function of Y is the mixture of the distribution functions F_i.

Example
Suppose that an insurance company acquired two portfolios of insurance policies and combined them into a single block. For each portfolio the aggregate claims variable has a compound Poisson distribution. For one of the portfolios, the Poisson parameter is \lambda_1 and the individual claim amount has an exponential distribution with parameter \delta_1. The corresponding Poisson and exponential parameters for the other portfolio are \lambda_2 and \delta_2, respectively. Discuss the distribution for the aggregate claims Y=Y_1+Y_2 of the combined portfolio.

The aggregate claims Y of the combined portfolio has a compound Poisson distribution with Poisson parameter \lambda=\lambda_1+\lambda_2. The amount of a random claim X in the combined portfolio has the following distribution function and density function:

\displaystyle F_X(x)=\frac{\lambda_1}{\lambda} (1-e^{-\delta_1 x})+\frac{\lambda_2}{\lambda} (1-e^{-\delta_2 x})

\displaystyle f_X(x)=\frac{\lambda_1}{\lambda} (\delta_1 \thinspace e^{-\delta_1 x})+\frac{\lambda_2}{\lambda} (\delta_2 \thinspace e^{-\delta_2 x})

The rest of the discussion mirrors the general discussion earlier in this post.

Distribution Function
As in the general case, \displaystyle F_Y(y)=\sum \limits_{n=0}^{\infty} F^{*n}(y) \frac{\lambda^n e^{-\lambda}}{n!}

where \lambda=\lambda_1 +\lambda_2, F=F_X and F^{*n} is the n-fold convolution of F_X.

Mean and Variance
\displaystyle E[Y]=\frac{\lambda_1}{\delta_1}+\frac{\lambda_2}{\delta_2}

\displaystyle Var[Y]=\frac{2 \lambda_1}{\delta_1^2}+\frac{2 \lambda_2}{\delta_2^2}

Moment Generating Function and Cumulant Generating Function
To obtain the mgf and cgf of the aggregate claims Y, consider \lambda [M_X(t)-1]. Note that M_X(t) is the weighted average of the two exponential mgfs of the two portfolios of insurance policies. Thus we have:

\displaystyle M_X(t)=\frac{\lambda_1}{\lambda} \frac{\delta_1}{\delta_1 - t}+\frac{\lambda_2}{\lambda} \frac{\delta_2}{\delta_2 - t}

\displaystyle \lambda [M_X(t)-1]=\frac{\lambda_1 t}{\delta_1 - t}+\frac{\lambda_2 t}{\delta_2 - t}

\displaystyle M_Y(t)=e^{\lambda (M_X(t)-1)}=e^{\frac{\lambda_1 t}{\delta_1 - t}+\frac{\lambda_2 t}{\delta_2 - t}}

\displaystyle \Psi_Y(t)=\frac{\lambda_1 t}{\delta_1 -t}+\frac{\lambda_2 t}{\delta_2 -t}

Skewness
Note that \displaystyle E[(Y-\mu_Y)^3]=\Psi_Y^{(3)}(0)=\frac{6 \lambda_1}{\delta_1^3}+\frac{6 \lambda_2}{\delta_2^3}

\displaystyle \gamma_Y=\displaystyle \frac{\frac{6 \lambda_1}{\delta_1^3}+\frac{6 \lambda_2}{\delta_2^3}}{(\frac{2 \lambda_1}{\delta_1^2}+\frac{2 \lambda_2}{\delta_2^2})^{\frac{3}{2}}}

Reference

  1. Hogg R. V. and Tanis E. A., Probability and Statistical Inference, Second Edition, Macmillan Publishing Co., New York, 1983.

Some examples of compound distributions

We present two examples of compound distributions to illustrate the general formulas presented in the previous post (An introduction to compound distributions).

For the examples below, let N be the number of claims generated by either an individual insured or a group of independent insureds. Let X be the individual claim amount. We consider the random sum Y=X_1+ \cdots + X_N. We discuss the following properties of the aggregate claims random variable Y:

  1. The distribution function F_Y
  2. The mean and higher moments: E[Y] and E[Y^n]
  3. The variance: Var[Y]
  4. The moment generating function and cumulant generating function:M_Y(t) and \Psi_Y(t).
  5. Skewness: \gamma_Y.

Example 1
The number of claims for an individual insurance policy in a policy period is modeled by the binomial distribution with parameter n=2 and p. The individual claim, when it occurs, is modeled by the exponential distribution with parameter \lambda (i.e. the mean individual claim amount is \frac{1}{\lambda}).

The distribution function F_Y is the weighted average of a point mass at y=0, the exponential distribution and the Erlang-2 distribution function. For x \ge 0, we have:

\displaystyle F_Y(x)=(1-p)^2+2p(1-p)(1-e^{-\lambda x})+p^2(1-\lambda x e^{-\lambda x}-e^{-\lambda x})

The mean and variance are are follows:

\displaystyle E[Y]=E[N] \thinspace E[X]=\frac{2p}{\lambda}

\displaystyle Var[Y]=E[N] \thinspace Var[X]+Var[N] \thinspace E[X]^2

\displaystyle =\frac{2p}{\lambda^2}+\frac{2p(1-p)}{\lambda^2}=\frac{4p-2p^2}{\lambda^2}

The following calculates the higher moments:

\displaystyle E[Y^n]=(1-p)^2 0 + 2p(1-p) \frac{n!}{\lambda^n}+p^2 \frac{(n+1)!}{\lambda^n}

\displaystyle = \frac{2p(1-p)n!+p^2(n+1)!}{\lambda^n}

The moment generating function M_Y(t)=M_N[ln \thinspace M_X(t)]. So we have:

\displaystyle M_Y(t)=\biggl(1-p+p \frac{\lambda}{\lambda -t}\biggr)^2

\displaystyle =(1-p)^2+2p(1-p) \frac{\lambda}{\lambda -t}+p^2 \biggl(\frac{\lambda}{\lambda -t}\biggr)^2

Note that \displaystyle M_N(t)=(1-p+p e^{t})^2 and \displaystyle M_X(t)=\frac{\lambda}{\lambda -t}.

For the cumulant generating function, we have:

\displaystyle \Psi_Y(t)=ln M_Y(t)=2 ln\biggl(1-p+p \frac{\lambda}{\lambda -t}\biggr)

For the measure of skewness, we rely on the cumulant generating function. Finding the third derivative of \Psi_Y(t) and then evaluate at t=0, we have:

\displaystyle \Psi_Y^{(3)}(0)=\frac{12p-12p^2+4p^3}{\lambda^3}

\displaystyle \gamma_Y=\frac{\Psi_Y^{(3)}(0)}{Var(Y)^{\frac{3}{2}}}=\frac{12p-12p^2+4p^3}{(4p-2p^2)^{\frac{3}{2}}}

Example 2
In this example, the number of claims N follows a geometric distribution. The individual claim amount X follows an exponential distribution with parameter \lambda.

One of the most interesting facts about this example is the moment generating function. Note that \displaystyle M_N(t)=\frac{p}{1-(1-p)e^t}. The following shows the derivation of M_Y(t):

\displaystyle M_Y(t)=M_N[ln \thinspace M_X(t)]=\frac{p}{1-(1-p) e^{ln M_X(t)}}

\displaystyle =\frac{p}{1-(1-p) \frac{\lambda}{\lambda -t}}=\cdots=p+(1-p) \frac{\lambda p}{\lambda p-t}

The moment generating function is the weighted average of a point mass at y=0 and the mgf of an exponential distribution with parameter \lambda p. Thus this example of compound geometric distribution is equivalent to a mixture of a point mass and an exponential distribution. We make use of this fact and derive the following basic properties.

Distribution Function
\displaystyle F_Y(y)=p+(1-p) (1-e^{\lambda p y})=1-(1-p) e^{-\lambda p y} for y \ge 0

Density Function
\displaystyle f_Y(y)=\left\{\begin{matrix}p&\thinspace y=0\\{(1-p) \lambda p e^{-\lambda p y}}&\thinspace 0 < y\end{matrix}\right.

Mean and Higher Moments
\displaystyle E[Y]=(1-p) \frac{1}{\lambda p}=\frac{1-p}{p} \frac{1}{\lambda}=E[N] E[X]

\displaystyle E[Y^n]=p 0 + (1-p) \frac{n!}{(\lambda p)^n}=(1-p) \frac{n!}{(\lambda p)^n}

Variance
\displaystyle Var[Y]=\frac{2(1-p)}{\lambda^2 p^2}-\frac{(1-p)^2}{\lambda^2 p^2}=\frac{1-p^2}{\lambda^2 p^2}

Cumulant Generating Function
\displaystyle \Psi_Y(t)=ln \thinspace M_Y(t)=ln\biggl(p+(1-p) \frac{\lambda p}{\lambda p-t}\biggr)

Skewness
\displaystyle E\biggl[\biggl(Y-\mu_Y\biggr)^3\biggr]=\Psi_Y^{(3)}(0)=\frac{2-2p^3}{\lambda^3 p^3}

\displaystyle \gamma_Y=\frac{\Psi_Y^{(3)}(0)}{(Var[Y])^{\frac{3}{2}}}=\frac{2-2p^3}{(1-p^2)^{\frac{3}{2}}}