The binomial distribution

If we have a binomial experiment, we are often interested in the probability of having a certain number of successes. This leads to the binomial distribution.

Definition 1

Consider a binomial experiment with repetition number nn and success probability pp.

The random variable NN="number of successes after nn repetitions" has the possible values 0,1,...,n0,1,...,n, and is called the binomial random variable with parameters nn and pp. We also say that NN is binomially distributed with paramters nn and pp.

The probability function of NN, p(N=0),p(N=1),...p(N=n)p(N=0), p(N=1),...p(N=n) is denoted by binompdf(n,p,k)binom{\color{red}p}df(n,p,k), where k=0,1,2...,nk=0,1,2...,n is the number of successes. Thus we have

p(N=k)=binompdf(n,p,k)p(N=k)=binom{\color{red}p}df(n,p,k)

The cumulative distribution function of NN, F(x)=p(Nx)F(x)=p(N\leq x) is denoted by binomcdf(n,p,x)binom{\color{red}c}df(n,p,x), thus we have

p(Nx)=binomcdf(n,p,x)p(N\leq x)=binom{\color{red}c}df(n,p,x)

Let us make an example first, and then we will derive a formula for calculating the probabilities p(N=k)p(N=k) for every kk.

Example 1

A biased coin has probability 0.20.2 that head occurs. The coin is flipped 44 times. Define the random variable N=N="number of heads".

  1. Is NN a binomial random variable? If so, what are its parameters nn and pp?

  2. What is the probability to get 22 heads? Use the calculator and binompdfbinompdf.

  3. What is the probability to get no more than 22 heads? Use the calculator and binomcdfbinomcdf.

Solution
  1. As it is a binomial experiment (success S=S="head"), NN is a binomial random variable with parameters n=4n=4 and p=0.2p=0.2.
  2. p(N=2)=binompdf(4,0.2,2)=0.1536p(N=2)=binompdf(4,0.2,2)=\underline{0.1536}
  3. p(N2)=binomcdf(4,0.2,2)=0.9728p(N\leq 2)=binomcdf(4,0.2,2)=\underline{0.9728}

A formula for binompdfbinompdf

So instead of using the calculator, let us derive a formula for calculating binompdfbinompdf and binomcdfbinomcdf. We will use the example from above (so study it before you go on). The tree representation of 44 flips with a coin with p(H)=0.2p(H)=0.2 is shown below.

We want to calculate the probability

p(N=2)=binompdf(4,0.2,2)p(N=2)=binompdf(4,0.2,2)

where NN="number of heads" is a binomial random variable with the parameters n=4n=4 und p=0.2p=0.2. Thus, we have to add the path probabilities of all paths which contain exactly 22 heads and 22 tails. We already know that from the discussion of the binomial coefficient that there are

(42)\left(\begin{array}{lll} 4 \\ 2\end{array}\right)

such paths. How do we know this? Well, each such path must correspond to a 4-letter word consisting of two HH and two TT (e.g. HHTT,THTHHHTT, THTH, ...), and there are (42)\left(\begin{array}{lll} 4 \\ 2\end{array}\right) ways to form such a words. But please verify in the above.

As each such path has exactly two heads and two tails, the path probability of each path is

0.220.820.2^2\cdot 0.8^2

Thus, the sum of the path probabilities is

p(N=2)=binompdf(4,0.2,2)=(42)0.220.82=0.1536\begin{array}{lll} p(N=2)&=&binompdf(4,0.2,2)\\ &=&\left(\begin{array}{lll} 4 \\ 2\end{array}\right)\cdot 0.2^2\cdot 0.8^2\\ &=&0.1536 \end{array}

Similar we have

p(N=0)=binompdf(4,0.2,0)=(40)0.200.84=0.4096\begin{array}{lll} p(N=0)&=&binompdf(4,0.2,0)\\ &=&\left(\begin{array}{lll} 4 \\ 0\end{array}\right)\cdot 0.2^0\cdot 0.8^4\\ &=&0.4096 \end{array}

and

p(N=1)=binompdf(4,0.2,1)=(41)0.210.83=0.4096\begin{array}{lll} p(N=1)&=&binompdf(4,0.2,1)\\ &=&\left(\begin{array}{lll} 4 \\ 1\end{array}\right)\cdot 0.2^1\cdot 0.8^3\\ &=&0.4096 \end{array}

(it is just by accident, that the two probabilities are the same).

The pattern should be apparent:

p(N=k)=binompdf(4,0.2,k)=(4k)0.2k0.84kp(N={\color{red}k})=binompdf(4,0.2,{\color{red}k})=\left(\begin{array}{lll} 4 \\ {\color{red}k}\end{array}\right)\cdot 0.2^{\color{red}k} \cdot 0.8^{4-\color{red}k}

Generally, we have:

Theorem 1

Consider a binomial random variable NN with the parameters nn and pp. It is:

p(N=k)=binompdf(n,p,k)=(nk)pk(1p)nk\begin{array}{lll} p(N=\color{red}k)&=&binompdf(n,p,{\color{red}k})\\&=&\left(\begin{array}{lll} n \\ {\color{red}k}\end{array}\right)\cdot p^{\color{red}k} \cdot (1-p)^{n-\color{red}k}\end{array}

where k=0,1,2,...,n{\color{red}k}=0,1,2,...,n.

How to calculate binomcdfbinomcdf

To calculate

p(N2)=binomcdf(4,0.2,2)p(N\leq 2)=binomcdf(4,0.2,2)

note that first, NN can only take on the values 0,1,2,3,40,1,2,3,4, and second, the events N=0N=0, N=1N=1, N=2N=2 are pairwise mutually exclusive. Thus we have

p(N2)=p(N=0N=1N=2)=p(N=0)+p(N=1)+p(N=2)=0.1536+0.4096+0.4096=0.9728\begin{array}{ll} p(N\leq 2)&=&p(N=0 \cup N=1 \cup N=2)\\ &=&p(N=0)+p(N=1)+p(N=2)\\ &=&0.1536+0.4096+0.4096\\ &=& 0.9728 \end{array}

There is no simple formula for calculating the cumulative distribution function of the binomial random variable directly. But we can calculate it directly using the calculator:

p(N2)=binomcdf(4,0.2,2)=0.9728p(N\leq 2)=binomcdf(4,0.2,2)=0.9728

binomcdfbinomcdf is useful for finding the probability of events like "number of heads is equal or smaller than 55". However, it can also be used for events like "at least 3 heads", or "more than heads", "number of heads is between 22 and 1010", and so on. But to do so, you have to make some transformations:

Theorem 2

Consider a binomial random variable NN with parameter nn and pp, and two numbers a{0,1,2,...,n}a\in\{0,1,2,...,n\} and b{0,1,...,n}b\in \{0,1,...,n\} with aba\leq b. The following is true:

  1. p(N<a)=p(Na1)=binomcdf(n,p,a1)p(N < a)=p(N \leq a-1)=binomcdf(n,p,a-1)
  2. p(N>a)=1p(Na)=1binomcdf(n,p,a)p(N>a)=1-p(N\leq a)=1-binomcdf(n,p,a)
  3. p(Na)=1p(Na1)=1binomcdf(n,p,a1)p(N\geq a)=1-p(N\leq a-1)=1-binomcdf(n,p,a-1)
  4. p(a<Nb)=p(Nb)p(Na)=binomcdf(n,p,b)binomcdf(n,p,a)p(a<N\leq b)=p(N\leq b)-p(N\leq a)=binomcdf(n,p,b)- binomcdf(n,p,a)
  5. p(aNb)=p(Nb)p(Na1)=binomcdf(n,p,b)binomcdf(n,p,a1)p(a\leq N\leq b)=p(N\leq b)-p(N\leq a-1)=binomcdf(n,p,b)- binomcdf(n,p,a-1)
  6. p(aN<b)=p(Nb1)p(Na1)=binomcdf(n,p,b1)binomcdf(n,p,a1)p(a\leq N< b)=p(N\leq b-1)-p(N\leq a-1)=binomcdf(n,p,b-1)- binomcdf(n,p,a-1)
  7. p(a<N<b)=p(Nb1)p(Na)=binomcdf(n,p,b1)binomcdf(n,p,a)p(a< N< b)=p(N\leq b-1)-p(N\leq a)=binomcdf(n,p,b-1)- binomcdf(n,p,a)

The proof is left as an exercise.

Exercise 1

Prove the statements above.

Solution

See the figure below. The blue dots indicate the event whose probability we want to calculate. An this event is calculated by adding the probability of all coloured dots minus the probability of all red dots.

Exercise 2

A coin (p(H)=0.1p(H)=0.1) is tossed 1010 times. NN denotes the number of heads. Determine the following probabilities:

  1. NN equals 00 (without calculator)

  2. NN equals 1010 (without calculator)

  3. NN is no more than 55

  4. NN is smaller than 55

  5. NN is at least 55

  6. NN is bigger than 55

  7. NN is at least 22 and smaller than 77

  8. NN is bigger than 22 and no more than 77

  9. NN is between 22 and 77 (borders included)

  10. NN is between 22 and 77 (borders excluded)

  11. NN is bigger than 00 (without calculator)

Solution
Exercise 3
Q1

Hospital records show that of patients suffering from a certain disease, 75%75\% die of it. You select at random 66 patients.

  1. What is the probability that 44 will recover?
  2. What is the probability that no more than 44 will recover?
Q2

In the old days, there was a probability of 0.80.8 of success in any attempt to make a telephone call. (This often depended on the importance of the person making the call, or the operator's curiosity!) Calculate the probability of having at least 77 successes in 1010 attempts.

Q3

A (blindfolded) marksman finds that on the average he hits the target 44 times out of 55. If he fires four shots, what is the probability of

  1. more than 22 hits?
  2. at least 33 misses?
Q4

In Singapore, the probability for giving birth to a boy is 0.52150.5215, for a girl it is 0.47850.4785. What proportion of Singapore families with exactly 66 children will have at least 33 boys?

Q5

You roll a fair die twice and form the sum. Repeating this 2020 times, what is the probability for observing the sum 88 more than half of the time?

Q6

A biased coin (p(H)=0.45p(H)=0.45) is tossed 250250 times. Determine the probability for observing

  1. 100100 heads.
  2. at least 100100 heads.
  3. between 104104 and 120120 heads (borders included)
  4. The probability for observing more than kk heads should be smaller than 20%20\%. Determine kk (you have to do this by trial and error using the calculator).
Q7

Overbooking. A course in medicine is limited to 120120 students. Experience shows that 10%10\% of the students cancel their applications. How many applications can be considered so that the probability for ending up with too many students is less than 5%5\%? Again, use trial and error to find the solution.

Q8

A biased coin with p(H)=0.4p(H)=0.4 is flipped nn times. Find nn such that the probability for observing at least one head is at least 99.99%99.99\%.

Q9

In a village, 44%44\% voted for Trump, and 56%56\% for Biden. You make a survey and select a random sample of people.

  1. If the sample size is 2020 people, what is the probability that more than 55 people but less than 1515 people voted for Biden?

  2. You want to choose the sample size big enough so that the sample contains at least one Biden voter with a probability of 0.9990.999 or bigger. What is the minimal sample size?

  3. You want to choose the sample size big enough so that the sample contains more than 55 Biden voters with a probability bigger than 0.9990.999. What is the minimal sample size?

Solution
A1

NN="number of recovered patients" is a binomial RV with parameters n=6n=6 and p=0.25p=0.25.

  1. p(N=4)=binompdf(6,0.25,4)=0.032p(N=4)=binompdf(6,0.25,4)=\underline{0.032}
  2. p(N4)=binomcdf(6,0.25,4)=0.995p(N\leq 4)=binomcdf(6,0.25,4)=\underline{0.995}.
A2

NN="number of successes" is a binomial RV with parameters n=10n=10 and p=0.8p=0.8. p(N7)=1binomcdf(10,0.8,6)=0.879p(N\geq 7)=1-binomcdf(10,0.8,6)=\underline{0.879}.

A3

NN="number of hits" is a binomial RV with parameters n=4n=4 and p=4/5p=4/5.

  1. p(N>2)=1binomcdf(4,4/5,2)=0.8192p(N > 2)= 1-binomcdf(4,4/5,2)=\underline{0.8192}
  2. p(N1)=binomcdf(4,4/5,1)=0.0272p(N\leq 1)=binomcdf(4,4/5,1)=\underline{0.0272}.
A4

NN="number of boys" is a binomial RV with parameters n=6n=6 and p=0.5215p=0.5215. p(N3)=1binomcdf(6,0.5215,2)=0.695p(N\geq 3)=1-binomcdf(6,0.5215,2)=\underline{0.695}.

A5

NN="number of times the sum is 88" is a binomial RV with parameters n=20n=20 and p=5/36p=5/36 (probability for sum is 88). p(N>10)=1binomcdf(20,5/36,10)=1.8105p(N>10)=1-binomcdf(20,5/36,10)=\underline{1.8\cdot 10^{-5}}.

A6

NN="number of heads" is a binomial RV with parameters n=250n=250 and p=0.45p=0.45

  1. binompdf(250,0.45,100)=0.014binompdf(250,0.45,100)=\underline{0.014}

  2. 1binomcdf(250,0.45,99)=0.9511-binomcdf(250,0.45,99)=\underline{0.951}

  3. binomcdf(250,0.45,120)binomcdf(250,0.45,103)=0.719binomcdf(250,0.45,120)-binomcdf(250,0.45,103)=\underline{0.719}

  4. find kk with

    p(N>k)=1binomcdf(250,0.45,k)<0.2p(N>k)=1-binomcdf(250,0.45,k)<0.2

    With trial and error using the calculator, we get k=119k=\underline{119}.

A7

Binomial experiment with success SS="not cancelled" and success probability p=0.9p=0.9. nn is the number of applicants (the number of repetitions of the Bernoulli-experiment "a randomly selected applicant cancels or not"). NN="number of times an application is not cancelled" (number of successes) is a binomial RV with parameters nn (unknown) and p=0.9p=0.9.

Find nn such that

p(N>120)<0.05p(N > 120)<0.05

that is

1binomcdf(n,0.9,120)<0.051-binomcdf(n,0.9,120) <0.05

Trial and error n=127\rightarrow n=\underline{127}.

A8

NN="number of heads" is a binomial RV with parameters nn and p=0.4p=0.4. We have to find nn such that

p(N1)0.9999p(N\geq 1)\geq 0.9999

Because of p(N1)=1p(N=0)p(N\geq 1)=1-p(N=0), we have to find nn with

p(N=0)0.0001p(N=0)\leq 0.0001

Let us first find nn with

p(N=0)=0.0001p(N=0)=0.0001

With

p(N=0)=(n0)0.400.6n=0.6n\begin{array}{lll} p(N=0)&=&\left(\begin{array}{lll} n \\ 0\end{array}\right) \cdot 0.4^0\cdot 0.6^n\\ &=& 0.6^n\end{array}

we therefore have to find nn with

0.6n=0.00010.6^n = 0.0001

Taking the logarithm on both sides, we get

nln(0.6)=ln(0.0001)n\cdot \ln(0.6)=\ln(0.0001)

and thus n=ln(0.0001)ln(0.6)=18.03n=\frac{\ln(0.0001)}{\ln(0.6)}=18.03, thus n=19n=\underline{19}.

A9

It is a binomial experiment, where success SS="Selected person voted for Biden", and the success probability is p(S)=0.56p(S)=0.56. nn is the number of people in the sample (the number or repetitions of the Bernoulli-Experiment, which is "select a person from the village at random, which will vote for Biden or not"). Let NN be the number of successes, that is, the number of people in the sample voting for Biden.

  1. n=20n=20,

    p(5<N<15)=p(N14)p(N5)=binomcdf(20,0.56,14)binomcdf(20,0.56,5)=0.929\begin{array}{lll} p(5<N<15)&=&p(N\leq 14)-p(N\leq 5)\\ &=&binomcdf(20,0.56,14)-binomcdf(20,0.56,5)\\ &=&\underline{0.929} \end{array}
  2. Find nn with

    p(N1)=0.999p(N\geq 1) =0.999

    We can solve for nn:

    p(N1)=1p(N<1)=1p(N=0)=1(n0)0.5600.44n=10.44n\begin{array}{lll} p(N\geq 1)&=&1-p(N<1)\\ &=& 1-p(N=0)\\ &=& 1-\left(\begin{array}{cc}n\\0\end{array}\right) \cdot 0.56^0\cdot 0.44^n \\ &=& 1-0.44^n \end{array}

    Thus, find nn with

    10.44n=0.999+0.44n,0.9990.44n=0.001log(.)nlog(0.44)=log(0.001):log(0.44)n=log(0.001)log(0.44)=8.414\begin{array}{cll} 1-0.44^n &=&0.999\quad\vert +0.44^n, -0.999\\ 0.44^n &=&0.001 \quad\vert \log(.)\\ n\log(0.44)&=&\log(0.001)\quad\vert :\log(0.44)\\ n&=&\frac{\log(0.001)}{\log(0.44)}\\ &=& 8.414 \end{array}

    Thus, it is n=9n=\underline{9}.

  3. Find nn with

    p(N>5)>0.999p(N>5)>0.999

    or

    1p(N5)>0.9991-p(N\leq 5) > 0.999

    that is

    1binomcdf(n,0.56,5)>0.9991-binomcdf(n,0.56,5) > 0.999

    In contrast to the previous problem (2), we cannot solve for nn, because binomcdf(n,0.56,5)binomcdf(n,0.56,5) does not reduce to a simple formula which we can solve. So we have to find nn by trial and error (insert some numbers for nn into the calculator). We get n=23n=\underline{23}.