Mean and standard deviation of a binomial random variable

Consider a binomial experiment with parameters nn and pp. Thus, the random variable NN="number of successes after nn repetitions" is binomially distributed and can take on one of the values 0,...,n0,...,n every time the experiment is performed, with the probability

p(N=k)=(nk)pk(1p)nk\begin{array}{lll} p(N=k)&=&\left(\begin{array}{lll} n \\ {k}\end{array}\right)\cdot p^{k} \cdot (1-p)^{n-k}\end{array}

where k=0,1,2,...,nk=0,1,2,...,n. We now want to calculate the average number of successes per experiment, μ\mu, and also the standard deviation σ\sigma from this average. Let's start with the result, and then we give the proof.

Theorem 1

Consider the binomial random variable NN with success probability pp and repetition number nn. The mean of the number of

μ=np\mu = n\cdot p

and the standard deviation

σ=np(1p)\sigma = \sqrt{n p (1-p)}

Since NN counts the number of successes per experiment, this means that the observed number of successes per experiment is on average μ\mu (averaged over a huge number of experiments), and the deviation of the observed number of successes from μ\mu per experiment is on average σ\sigma (also averaged over a huge number of experiments).

Here is an example.

Example 1

An experiment consists of flipping 1010 times a biased coin with p(H)=0.55p(H)=0.55. On average, how many heads do you observe per Experiment? And what is the typical deviation of the observed number of heads per Experiment from this average?

NN="number of heads" is a binomial variable with n=10n=10 and p=0.55p=0.55. Thus, on average we observe

μ=100.55=5.5\mu = 10\cdot 0.55=\underline{5.5}

heads and typically, the deviation from this value is

σ=100.550.45=1.57\sigma = \sqrt{10\cdot 0.55\cdot 0.45}=\underline{1.57}

heads.

Note that the formula for the mean μ=np\mu=np makes intuitive sense: If we perform the coin experiment from the example above many times (say 10000{\color{red}10000} times), then yes, we flip the coin a total of

10000n=1000010=100000{\color{red}10000}\cdot n={\color{red}10000}\cdot 10 =\color{green}{100000}

times, and therefore observe heads a total of approximately

100000p=55000\color{green}{100000}p=55000

times (by definition of probability as long-time relative frequency). So per experiment we see on average

55000/10000=5.555000/{\color{red}10000}=5.5

heads, or expressed as a formula with nn and pp:

μ=5500010000=100000p10000=10000np10000=np\mu = \frac{55000}{{\color{red}10000}}=\frac{\color{green}{100000}p}{{\color{red}10000}}= \frac{{\color{red}10000}np}{{\color{red}10000}}=np

But how can we prove these formulas using our general formula for the mean and standard deviation of random variables? Well, applying the general method for calculating μ\mu and σ\sigma for NN, we have

μ=p(N=0)0+p(N=1)1+...+p(N=n)n\mu = p(N=0)\cdot 0 +p(N=1)\cdot 1 + ... + p(N=n)\cdot n

and

σ=p(N=0)(0μ)2+p(N=1)(1μ)2+...+p(N=n)(nμ)2\sigma = \sqrt{p(N=0)\cdot (0-\mu)^2 +p(N=1)\cdot (1-\mu)^2 + ... + p(N=n)\cdot (n-\mu)^2}

In the case of binomially distributed random variables, we have

p(N=k)=binompdf(n,p,k)=(nk)pk(1p)nkp(N=k)=binompdf(n,p,k)=\left(\begin{array}{lll} n \\ k\end{array}\right) p^k (1-p)^{n-k}

Inserting the formula for p(N=k)p(N=k) in the above expressions for μ\mu and σ\sigma, and then simplify the resulting expression, we get after many algebraic manipulations and simplifications that μ=np\mu=n p and σ=np(1p)\sigma =\sqrt{n p (1-p)}. The example below shows this calculation for the case n=2n=2.

Example 2

We want to show that for the case n=2n=2 it is μ=2p\mu=2p and σ=2p(1p)\sigma=\sqrt{2p (1-p)}.

Solution

With

p(N=0)=(20)p0(1p)2=(1p)2p(N=1)=(21)p1(1p)1=2p(1p)p(N=2)=(22)p2(1p)0=p2\begin{array}{lll} p(N=0) &=&\left(\begin{array}{lll} 2 \\ 0\end{array}\right)\cdot p^0\cdot (1-p)^2 = (1-p)^2\\ p(N=1) &=&\left(\begin{array}{lll} 2 \\ 1\end{array}\right)\cdot p^1\cdot (1-p)^1 = 2 p (1-p)\\ p(N=2) &=&\left(\begin{array}{lll} 2 \\ 2\end{array}\right)\cdot p^2\cdot (1-p)^0 = p^2\\ \end{array}

and we get

μ=p(N=0)0+p(N=1)1+p(N=2)2=2p(1p)+2p2=2p\begin{array}{lll} \mu &=& p(N=0)\cdot 0 + p(N=1)\cdot 1 + p(N=2)\cdot 2\\ &=& 2 p (1-p) + 2 p^2\\ &=& \underline{2p} \end{array}

and for the variance σ2\sigma^2 we get

σ2=p(N=0)(0μ)2+p(N=1)(1μ)2+p(N=2)(2μ)2=(1p)2(02p)2+2p(1p)(12p)2+p2(22p)2=2p(1p)\begin{array}{lll} \sigma^2 &=& p(N=0)\cdot (0-\mu)^2 + p(N=1)\cdot (1-\mu)^2 + p(N=2)\cdot (2-\mu)^2\\ &=& (1-p)^2\cdot (0-2p)^2 + 2p(1-p) \cdot (1-2p)^2 + p^2\cdot (2-2p)^2\\ &=& 2p(1-p)\\ \end{array}

Thus, we have σ=2p(1p)\sigma =\underline{\sqrt{2p(1-p)}}