Mean and standard deviation of RVs

We have already seen that a discrete random variable XX with the possible outputs x1,...,xux_1,...,x_u has the mean

μ=p(X=x1)x1+...+p(X=xu)xu\mu = p(X=x_1)\cdot x_1 +...+p(X=x_u)\cdot x_u

and standard deviation

σ=p(X=x1)(x1μ)2+...+p(X=xu)(xuμ)2\sigma = \sqrt{p(X=x_1)\cdot (x_1-\mu)^2 +...+p(X=x_u)\cdot (x_u-\mu)^2}

This means that if we repeat the experiment many times, then the average of all the outputs of XX is μ\mu, and the typical deviation from this average is σ\sigma. If XX is a continuous random variable, we get the continuous version of these formulas, where we replace the sum with the integral:

Theorem 1

The mean and standard deviation of a continuous random variable XX with probability density function ff is

μ=xf(x)dx\mu = \int_{-\infty}^\infty x \cdot f(x)\, dx

and

σ=(xμ)2f(x)dx\sigma = \sqrt{\int_{-\infty}^\infty (x-\mu)^2\cdot f(x)\, dx}

Before we give arguments why these formulas are correct, let's make an example.

Exercise 1

Consider a random experiment with a continuous random variable XX whose probability density function is

f(x)={3434x2x[1,1]0otherwisef(x)=\begin{cases}\frac{3}{4}-\frac{3}{4}x^2 & x\in [-1,1] \\ 0 & \text{otherwise} \end{cases}

Determine the mean and standard deviation of XX.

Solution

We have to take the integral of the function

xf(x)=x(3434x2)=34x34x3\begin{array}{lll} x\cdot f(x)&=& x\cdot \left(\frac{3}{4}-\frac{3}{4}x^2\right)\\ &=& \frac{3}{4}x-\frac{3}{4}x^3 \end{array}

Thus,

μ=(xf(x))dx=11(34x34x3)dx=F(1)F(1)=(38316)(38316)=0\begin{array}{lll} \mu & = & \int_{-\infty}^\infty (x \cdot f(x))\, dx\\ &=& \int_{-1}^1 \left(\frac{3}{4}x-\frac{3}{4}x^3\right)\, dx\\ &=& F(1)-F(-1)\\ &=& \left(\frac{3}{8}-\frac{3}{16}\right)-\left(\frac{3}{8}-\frac{3}{16}\right)\\ &=& 0 \end{array}

We have used that F(x)=38x2316x4F(x)=\frac{3}{8}x^2-\frac{3}{16}x^4 is the anti-derivative of 34x34x3\frac{3}{4}x-\frac{3}{4}x^3. And for the standard deviation we have

σ2=(x0)2f(x)dx=11x2(3434x2)dx=11(34x234x4)dx=F(1)F(1)=(14320)(14+320)=15\begin{array}{lll} \sigma^2 & = & \int_{-\infty}^\infty (x-0)^2 \cdot f(x)\, dx\\ &=& \int_{-1}^1 x^2\cdot \left(\frac{3}{4}-\frac{3}{4}x^2\right)\, dx\\ &=& \int_{-1}^1 \left(\frac{3}{4}x^2-\frac{3}{4}x^4\right)\, dx\\ &=& F(1)-F(-1)\\ &=& \left(\frac{1}{4}-\frac{3}{20}\right)-\left(-\frac{1}{4}+\frac{3}{20}\right)\\ &=& \frac{1}{5} \end{array}

We have used that F(x)=14x3320x5F(x)=\frac{1}{4}x^3-\frac{3}{20}x^5 is the anti-derivative of 34x234x4\frac{3}{4}x^2-\frac{3}{4}x^4. Thus, the standard deviation is

σ=15\sigma =\sqrt{\frac{1}{5}}

Now, to see why these formulas for the mean and standard deviation are correct, uncollapse. The proof is technical ... just try to follow.

Show

We divide the xx-axis into tiny bins binibin_i (uu of them), so that we have

p(Xbini)f(xi)Δxp(X\in bin_i) \approx f(x_i)\,\Delta x

where binibin_i has size Δx\Delta x and the midpoint is xix_i. Now we can apply the formula for the discrete case, that is,

μp(Xbin1)x1+...+p(Xbinu)xuf(x1)Δxx1+...+f(xu)Δxxu=xf(x)dx\begin{array}{lll} \mu &\approx & p(X\in bin_1)\cdot x_1+...+p(X\in bin_u)\cdot x_u\\ &\approx & f(x_1)\cdot \Delta x\cdot x_1 + ...+ f(x_u)\cdot \Delta x\cdot x_u\\ &=& \int_{-\infty}^{\infty} x\cdot f(x)\, dx\\ \end{array}

The proof for the standard deviation is similar.

Exercise 2

The random variable XX has the density function

f(x)={ax2x[1,2]0x[1,2]cf(x)=\begin{cases} \frac{a}{x^2} & x\in [1,2]\\ 0 & x\in [1,2]^c\\ \end{cases}

where aa is still to be determined.

  1. Determine the value aa.

  2. Determine the mean and the standard deviation of XX.

Solution
  1. The integral has to be 11:

    12fX(x)dx=12ax2dx=1\int_1^2 f_X(x)\, dx = \int_1^2 \frac{a}{x^2}\, dx =1

    The antiderivative of ax2=ax2\frac{a}{x^2}=ax^{-2} is

    F(x)=a(1)x1=axF(x)=a(-1)x^{-1}=-\frac{a}{x}

    thus we have the equation

    F(2)F(1)=1F(2)-F(1)=1 a2(a1)=a2=1a=2-\frac{a}{2}-(-\frac{a}{1})=\frac{a}{2}=1 \rightarrow a=2

    Thus, it is fX(x)=2x2f_X(x)=\frac{2}{x^2}.

  2. For the average we have:

    μ=12xfX(x)dx=12x2x2dx=122xdx=2ln(2)2ln(1)=2ln(2)=1.386\begin{array}{lll} \mu &=&\int_1^2 x f_X(x)\, dx \\ &=& \int_1^2 x\frac{2}{x^2}\, dx \\ &=& \int_1^2 \frac{2}{x}\, dx\\ &=&2\ln(2)-2\ln(1)\\ &=&2\ln(2)\\ &=&1.386\\ \end{array}

    For the standard deviation we have

    σ2=12(x1.386)2fX(x)dx=12(x1.386)22x2dx=12(x22.772x+1.922)2x2dx=1225.545x1+3.843x2dx=F(2)F(1)=0.08\begin{array}{lll} \sigma^2 &=&\int_1^2 (x-1.386)^2 f_X(x)\, dx \\ &=& \int_1^2 (x-1.386)^2 \frac{2}{x^2}\, dx \\ &=& \int_1^2 (x^2-2.772x+1.922)\frac{2}{x^2}\, dx \\ &=& \int_1^2 2-5.545x^{-1}+3.843x^{-2}\, dx \\ &=& F(2)-F(1)\\ &=& 0.08\\ \end{array}

    where the antiderivative of 25.545x1+3.843x22-5.545x^{-1}+3.843x^{-2} is

    F(x)=2x5.544ln(x)3.844x1F(x)=2x-5.544\ln(x)-3.844x^{-1}

    Thus, we get

    σ=0.28\sigma = 0.28