Conditional probability

Conditional probability is an important concept, and occurs quite naturally in surveys or more generally in experiments involving multiple events.

We use the example of a survey to explain the concept: we randomly pick people living in New York and inquire about their voting behaviour. Thus, the set of possible outcomes $S$ contains all New Yorkers. Let us define the two events

$E$ ="votes for Trump" and
$F$ ="is male"

Picking a New Yorker at random, $p(E)$ denotes the probability for selecting a Trump voter, and $p(F)$ is the probability for selecting a male. Or expressed as percentages, $p(E)$ is the percentage of New Yorkers voting for Trump and $p(F)$ is percentage of New Yorkers being male.

As is often the case for surveys, we want to be more specific and, for example, find the percentage of male New Yorkers voting for Trump. In other words, we are interested in the probability to select a Trump voter, where we now randomly select not from all New Yorkers, but only from all male New Yorkers.

To say it in yet another way, we want to know the probability to select a Trump voter given that the selected person is a male New Yorker. This probability is written as

p(E\vert F)

and is called the conditional probability of E given F. The vertical line stands for "given", so $p(E|F)$ is pronounced " $p$ of $E$ given $F$ ". The situation is illustrated in a Venn-diagram below. The percentage $p(E)$ is relative to the sample space $S$ , while $p(E\vert F)$ is the percentage relative to $F$ . So there is a change in sample space.

Note that there is a subtle difference between $p(E\vert F)$ and $p(E\cap F)$ . Both describe the same persons (male Trump voters in New York), but $p(E\cap F)$ expresses the number of these people as a percentage of $S$ (all New Yorkers), while $p(E\vert F)$ expresses the same number of people as a percentage of $F$ (male New Yorkers).

What is the numerical relationship between these two probabilities? How can we convert from one to the other? Here is a small example. If

the area $E\cap F$ is half of $F$ :

p(E\vert F)=\frac{1}{2} \text{ of $F$}

and the area of $F$ is a quarter of $S$ :

p(F)=\frac{1}{4}\text{ of $S$}

then clearly the area of $E\cap F$ is half of the quarter of $S$ :

p(E\cap F)=\underbrace{\frac{1}{2}}_{p(E\vert F)}\cdot\underbrace{\frac{1}{4}}_{p(F)}=\frac{1}{8}

Written more generally, we have

p(E\cap F)=p(E\vert F)\cdot p(F)

Or in a picture:

Observe how on the right side the F-circles cancel and the equation is correct. Dividing both sides of the equation by $p(F)$ , we get the equation

\boxed{p(E\vert F)=\frac{p(E\cap F)}{p(F)}}

Similarly we can show that

\boxed{p(F\vert E)=\frac{p(F\cap E)}{p(E)}}

All basic properties about probabilities also work for the conditional probability (without proof, but should be intuitively clear as $p(E|F)$ is simply a probability with a sample space restricted to $F$ ):

Theorem 1

Let $E, F$ and $G$ be events of a random experiment, then

$p(F\vert F)=1, p(\{ \}\vert F)=0$
$p(E\cup G\vert F)=p(E\vert F)+p(G\vert F)-p(E\cap G\vert F)$
$p(E\cup G\vert F)=p(E\vert F)+p(G\vert F)$ if $E$ and $G$ are mutually exclusive.
$p(E^\prime\vert F)=1-p(E\vert F)$

Exercise 1

Q1

In a town, $30\%$ of the population are male. Of these, $10\%$ are blond. A person is selected at random. Determine the probability that

the person is blond, given that the person is male.
the person is blond and male.

Q2

A teacher gave her class two tests. $42\%$ of the students passed at least the first test, and $25\%$ of the students passed both tests. You select a student at random from the ones that have passed test 1. What is the probability that the student has also passed test 2?

Q3

From $1000$ people in a town, $675$ own a house, and of these, $300$ own a car. If you select a person from these $1000$ people at random, what is the probability that the person owns both a house and a car?

Q4

Show that $p(F|E)+p(F^\prime|E)=1$ .

Q5

$A$ and $B$ are two events of a random experiments with $p(A)=0.7$ , $p(B)=0.6$ , and $p(A \cap B^\prime)=0.2$ . Determine $p(A^\prime \cap B)$ , $p(A|B)$ , and $p(B|A)$ .

Solution

A1

$B=$ "selected person is blond", $M=$ "selected person is male". The sample space is formed by the people in town.

$p(B|M)=\underline{0.1}$
$p(B\cap M)=p(B|M)\cdot p(M) = 0.1 \cdot 0.3 = \underline{0.03}$ .

A2

$T_1=$ "selected student passes test 1", $T_2=$ "selected student passes test 2", and sample space $S$ is formed by the students in the class. It is $p(T_1\cap T_2)=0.25$ , and $p(T_1)=0.42$ . With $p(T_1 \cap T_2)=p(T_2 | T_1)\cdot p(T_1)$ we get $p(T_2 | T_1)=\frac{p(T_1 \cap T_2)}{ p(T_1)}=\frac{0.25}{0.42}=\underline{0.59}$ .

A3

$H=$ "House owner", $C$ ="car owner", $p(H)=\frac{675}{1000}=0.675$ , $p(C\vert H)=\frac{300}{675}=0.\overline{4}$ . Thus, $p(C\cap H)=p(C\vert H)\cdot p(H)=0.\overline{4}\cdot 0.675=\underline{0.3}$

Or, because $300$ people are in the intersection $H\cap C$ , we can also calculate directly that $p(C\cap H)=\frac{300}{1000}=\underline{0.3}$ .

A4

Repeat the experiment many times, and focus on the experiments where $E$ occurred. Of these, the percentage of experiments where $F$ occurred (as well) is $p(F|E)$ , and the percentage of experiments where $F$ did not occur is $p(F^\prime|E)$ . Thus, the two percentages add up to $100\%$ .

A5

Venn-Diagram $\rightarrow$ $p(A\cap B)=0.5 \rightarrow p(A^\prime\cap B)=\underline{0.1}$ . $p(A|B)=\frac{p(A\cap B)}{p(B)}=\frac{0.5}{0.6}=\underline{5/6}$ , and $p(B|A)=\frac{p(B\cap A)}{p(A)}=\frac{0.5}{0.7}=\underline{5/7}$ .