The law of total probability

The goal of this post is to present an example to illustrate the law of total probability and to use this example to motivate the definition of a random variable that is a mixture. To state the finite form of the law of total probability, let E_1,E_2,...,E_n be events such that

  • E_i \cap E_j=\phi for i \neq j and
  • P(E_1)+P(E_2)+...+P(E_n)=1

Then for any event E, we have:

(0) \ \ \ \ \ \ P(E)=P(E \cap E_1)+P(E \cap E_2)+ \cdots +P(E \cap E_n)

According to the Bayes’ theorem, \displaystyle P(E \lvert E_i)=\frac{P(E \cap E_i)}{P(E_i)}. Thus the total law of probability can be stated as follows:

(1) \ \ \ \ \ \ P(E)=P(E \lvert E_1)P(E_1)+P(E \lvert E_2)P(E_2)+ \cdots +P(E \lvert E_n)P(E_n)

In a random experiment, a box is chosen based on a coin toss. If the toss is head, Box 1 is chosen. If the toss is tail, Box 2 is chosen. Box 1 has 3 white balls and 1 red ball. Box 2 has 1 white ball and 4 red balls. The probability of the coin turning up head is 0.75. Once the box is chosen, two balls are drawn successively with replacement. Let X be the number of red balls drawn. Find the probability function and the distribution function of the discrete random variable X.

Alternative Description of The Example
We can couch the same example as a risk classification problem in insurance. For example, an auto insurer has two groups of policyholders. On the basis of historical data, the insurer has determined that the claim frequency during a policy year for a policyholder classified as good risk follows a binomial distribution with n=2 and p=\frac{1}{4}. The claim frequency for a policyholder classified as bad risk follows a binomial distribution with n=2 and p=\frac{4}{5}. In this block of policies, 75% are classified as good risks and 25% are classified as bad risks. A new customer, whose risk class is not yet known with certainty, has just purchased a new policy. What distribution should be used to model the claim frequency for this new customer?

Discussion of Example
Let I be the following indicator variable.

\displaystyle I=\left\{\begin{matrix}1&\thinspace \text{probability=0.75}\\{2}&\thinspace \text{probability=0.25}\end{matrix}\right.

By the law of total probability, for i=0,1,2, we have:

P(X=i)=P(X=i \lvert I=1)P(I=1)+P(X=i \lvert I=2)P(I=2).

The following calculation derives the probability function and the distribution function.

\displaystyle P(X=0)=\biggl(\frac{3}{4}\biggr)^2 \frac{3}{4}+\biggl(\frac{1}{5}\biggr)^2 \frac{1}{4}=\frac{2764}{6400}

\displaystyle P(X=1)=2\biggl(\frac{1}{4}\biggr)\biggl(\frac{3}{4}\biggr) \frac{3}{4}+2\biggl(\frac{4}{5}\biggr)\biggl(\frac{1}{5}\biggr) \frac{1}{4}=\frac{2312}{6400}

\displaystyle P(X=2)=\biggl(\frac{1}{4}\biggr)^2 \frac{3}{4}+\biggl(\frac{4}{5}\biggr)^2 \frac{1}{4}=\frac{1324}{6400}

\displaystyle P(X \leq 0)=\frac{2764}{6400}

\displaystyle P(X \leq 1)=\frac{5076}{6400}

\displaystyle P(X \leq 2)=\frac{6400}{6400}=1

Note that in our example here, the unconditional probability function P(X=i) is a weighted sum of two conditional probability functions (based on the conditioning on the indicator variable). It also follows from the total law of probability that the unconditional distribution function is also a weighted sum of the conditional distribution functions. For x=0,1,2, we have:

P(X \le x)=P(X \le x \lvert I=1)P(I=1)+P(X \le x \lvert I=2)P(I=2)

A random variable whose distribution function is the weighted sum of a family of distribution functions is called a mixture. The example is this post is called a discrete mixture since the set of weights is a discrete set. We end this post with the following definition.

Definition of Mixtures
Motivated by the example above, a random variable X is said to be a mixture if its distribution function is of the form

F_X(x)=\sum p_i F_{X_i}(x)

for some sequence of random variables X_1,X_2,X_3,... and some sequence of positive real numbers p_1,p_2,p_3,... that sum to 1. The numbers p_1,p_2,p_3,... are the mixing weights. In this case, X is said to be a discrete mixture.

The concept of mixture (or mixing) does not need to be restricted to a countable number of distributions. We can mix a family of distributions indexed by the real numbers or some interval of real numbers and use a continuous probability density function as the mixing wieghts. A random variable is said to be a continuous mixture if

\displaystyle F_X(x)=\int_{-\infty}^{+\infty}F_{X \lvert \Lambda=\lambda}(x) f_{\Lambda}(\lambda)d \lambda

for some family of random variables X \lvert \Lambda=\lambda and some density function f_{\Lambda}. Note that the family of random variables X \lvert \Lambda=\lambda is indexed by the real numbers or some interval of real numbers [a,b].

Mixtures arise in many settings. The notion of mixtures is important in insurance applications. For example, the claim frequency or amount of a random loss may have an uncertain risk parameter that varies from insured to insured. By mixing the conditional distribution of the claim frequency (or random loss amount) with the distribution of the uncertain risk parameter, we have a model that can describe the claim experience.

Discrete mixture arises when the risk parameter is discrete (e.g. the risk classifications is discrete). Continuous mixture is important for the situations where the risk parameter of the random loss distribution follows a continuous distribution. Examples of continuous mixtures and basic properties of mixtures will be discussed in subsequent posts (see Examples of mixtures, Basic properties of mixtures).