The probability of an outcome

Consider the random experiment "tossing a coin". If we perform the experiment, the outcome HH ("head") will occur with a certain likelihood. We would like to quantify this likelihood with a number between 00 and 11, where bigger means more likely. We will call this number the probability of outcome H, denoted by

p(H)p(H)

This works as follows. Repeat the experiment under the exact same conditions NN times. Count how often the outcome HH occurs, and let's denote this number by nn. The relative frequency of occurrences of HH is defined as

nN\frac{n}{N}

In essence, this will be our probability p(H)p(H). Indeed, this number is between 00 and 11, and the bigger the relative frequency is, the more has the outcome HH occurred. For example, if HH were to occur every time, the relative frequency is n/N=N/N=1n/N=N/N=1. If outcome HH never occurs, the relative frequency is 0/N=00/N=0.

But note that we cannot simply set p(H)=n/Np(H)=n/N, because this is not well defined. Why? Well, every time we attempt to determine the value n/Nn/N by performing the experiment NN times, the value will vary! This is demonstrated in the exercise below.

Exercise 1

We consider the random experiment "flipping a coin once". Repeat the experiment N=20N=20 times, and determine the relative frequency of HH.

Then Determine the relative frequency of HH again, using the same procedure. Observe that the two relative frequencies are different.

How can we avoid these fluctuations of the relative frequency n/Nn/N? The next exercise offers a solution.

Exercise 2

We consider the random experiment "flipping a coin once". Repeat the experiment N=10N=10 times and count the number of times that head HH occurs. Determine the relative frequency of head, n/Nn/N.

Repeat the experiment another 1010 times, so in total we have N=20N=20 repetitions. Determine the total number of occurrences of head, and again calculate the relative frequency n/Nn/N.

Now continue this procedure by always flipping another 1010 times, and fill out the table shown below:

N102030405060708090100nnN\begin{array}{|c|l|l|l|l|l|l|l|l|l|l|l|}\hline N & 10 & 20 & 30 & 40 & 50 & 60 & 70 & 80 & 90 & 100 \\\hline n & & & & & & & & & & \\\hline \frac{n}{N} & & & & & & & & & & \\\hline \end{array}

Also, indicate the calculated relative frequencies n/Nn/N as a function of NN in a coordinate system (NN along the xx-axis, n/Nn/N along the yy-axis).

What you should observe is that the with higher values of repetitions NN, the relative frequency n/Nn/N stabilises and approaches a specific value. We define this value as the probability of outcome oo:

Definition 1

Consider an experiment, and denote one outcome by oo. The probability of outcome o is defined as the long-run relative frequency of oo:

p(o)=nN(N large)p(o)=\frac{n}{N}\quad (N \text{ large})

where NN is the number of repetitions of the experiment, and nn is the number of experiments in which oo occurred.

The term "long-run" refers to the fact that NN has to be very, very large (we choose NN so big that the fluctuations in n/Nn/N become negligible).

Note that the relative frequency of oo can also be expressed as a percentage (of the repetitions NN). For this reason, we can also express the probability as a percentage. For example, p(o)=0.2p(o)=0.2 can also be expressed as p(o)=20%p(o)=20\%. We will use both notations.

Thus we can also rephrase the definition of the probability using percentages.

Note 1

Repeating the experiment NN times, where NN is a big number, then p(o)p(o) is the percentage of times that outcome oo occurred.

Exercise 3

For a die it is p(6)=1/6p(6)=1/6. You roll the die 1200012\,000 times. What is the number of 66's you can expect to observe? Is this number accurate?

Solution

As p(6)n12000=16p(6)\approx\frac{n}{12\,000}=\frac{1}{6} it follows n120006=2000n\approx\frac{12\,000}{6}=2000. This is just an estimate and will fluctuate. But as NN is quite large, the fluctuations of nn will be quiet small.

Here is our first theorem about probabilities. The proof is given as an exercise.

Theorem 1

The sum of all outcome probabilities of a random experiment equals 11. That is, if o1,o2,...,omo_1,o_2,...,o_m are the possible outcomes of the experiment, then

i=1mp(oi)=p(o1)+p(o2)+...+p(om)=1\sum_{i=1}^m p(o_i) = p(o_1)+p(o_2)+...+p(o_m)=1
Exercise 4

Give a proof of the statement above.

Solution

Repeat the experiment NN times, where NN is a very large number. By definition of the probability, p(oi)p(o_i) is the percentage of times that outcome oio_i occurs. Adding the percentages for every outcome oio_i, we must get 100%100\%. This is so because every repetition of the random experiment results in exactly one of the outcomes o1,o2,...,omo_1,o_2,...,o_m.