The normal distribution

We discuss now the most important probability density function. It is used for data that groups around a single value. Here is the definition:

Definition 1

A continuous random variable XX is called normally distributed with mean μ\mu and standard deviation σ\sigma, if the probability density function of XX is

fμ,σ(x)=1σ2πe12(xμσ)2f_{\mu,\sigma}(x)=\frac{1}{\sigma \sqrt{2\pi}}\cdot e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}

Note that the mean μ\mu and the standard deviation σ\sigma appear in the definition of the density function. Depending on these values (called parameter of the function), the graph of the function will look different.

The simplest case

Let us focus for the moment on the simplest case where μ=0\mu=0 and σ=1\sigma=1 (σ=0\sigma=0 does not make sense, or at least is not interesting). We then get

f0,1(x)=12πex2/2f_{0,1}(x)=\frac{1}{\sqrt{2\pi}}\cdot e^{-x^2/2}

which looks like a bell (see below), and for this reason is also called the bell curve.

Let's discuss some key features of this graph.

a)

At x=0x=0 we get

f0,1(0)=12πe0=12π=0.3989...0.4f_{0,1}(0)=\frac{1}{\sqrt{2\pi}}\cdot e^{0}=\frac{1}{\sqrt{2\pi}}=0.3989... \approx 0.4

which is easily seen on the figure above as well.

b)

To see that f0,1f_{0,1} has a maximum at x=0x=0, we can invoke differential calculus. We have to show that f(0)=0f^{\prime}(0)=0 and f(0)<0f^{\prime\prime}(0)<0. See the exercise 1 below.

c)

And if we write

f0,1(x)=12πex2/20.4ex2/2=0.4ex2/2f_{0,1}(x)=\frac{1}{\sqrt{2\pi}}\cdot e^{-x^2/2} \approx 0.4\cdot e^{-x^2/2} = \frac{0.4}{e^{x^2/2}}

we see that with xx approaching \infty of -\infty, the graph approaches the xx-axis, but is always >0>0.

d)

We define the width of the normal distribution as the distance between the inflection points of the graph (indicated above by the points AA and BB). Recall that an inflection point is defined as a point on the graph where the slope of the graph changes from increasing to decreasing or vice versa:

Using differential calculus we can show that A(10.24...)A(-1 | 0.24...) und B(10.24...)B(1| 0.24...). Thus, the width of the graph f0,1f_{0,1} is 22 (see exercise 1 below).

e)

Recall that the probability density function follows the height of the bars in histograms (the density), und therefore is proportional to the relative frequency of the data points in each bin. So we see that the graph of f0,1f_{0,1} describes data which is grouped around μ=0\mu=0 and thins out in both directions.

f)

For f0,1f_{0,1} it is

xf0,1(x)dx=0\int_{-\infty}^{\infty} x \cdot f_{0,1}(x)\, dx = 0

(thus, the mean is indeed μ=0\mu=0) and

(xμ)2f0,1(x)dx=1\sqrt{\int_{-\infty}^{\infty} (x-\mu)^2 \cdot f_{0,1}(x)\, dx} = 1

(the standard deviation is indeed σ=1\sigma=1).

Again, see problem 1 for the proof.

Exercise 1

Show that for f0,1f_{0,1} the following is correct:

  1. it has a local maximum at (00.4)(0|0.4)

  2. the inflections points have the coordinates A(10.24)A(-1 | 0.24) and B(10.24)B(1| 0.24)

  3. xf0,1(x)dx=0\int_{-\infty}^{\infty} x \cdot f_{0,1}(x)\, dx = 0

Solution

Applying the chain rule, we get for the derivative of f(x)=12πex2/2f(x)=\frac{1}{\sqrt{2\pi}}\cdot e^{-x^2/2} the following:

f(x)=12πex2/2(x)f'(x)=\frac{1}{\sqrt{2\pi}}\cdot e^{-x^2/2} \cdot (-x)

and for the second derivative we get

f(x)=12πex2/2(x21)f^{\prime\prime}(x)=\frac{1}{\sqrt{2\pi}}\cdot e^{-x^2/2} \cdot (x^2 - 1)
  1. To find the maximum, we have to find an xx with

    f(x)=0f'(x)=0

    that is

    f(x)=12πex2/2(x)=0f'(x)=\frac{1}{\sqrt{2\pi}}\cdot e^{-x^2/2} \cdot (-x)=0

    and this is only possible for x=0x=0. And because f(0)<0f^{\prime\prime}(0)<0, we are indeed talking about a maximum. The yy-coordinate of the peak is

    y=f0,1(0)=12πe00.4y=f_{0,1}(0)=\frac{1}{\sqrt{2\pi}}\cdot e^0 \approx 0.4

    Thus we have P(00.4)P(0|0.4)

  2. To find the inflection points we have to find xx with

    f(x)=0f^{\prime\prime}(x)=0

    Thus find xx with

    f(x)=12πex2/2(x21)=0f^{\prime\prime}(x)=\frac{1}{\sqrt{2\pi}}\cdot e^{-x^2/2} \cdot (x^2 - 1)=0

    we see that this is possible for x21=0x^2-1=0, that is, if x=1x=-1 or x=1x=1. Using the calculator, we get y=f(1)=f(1)=0.24...y=f(-1)=f(1)=0.24.... We thus get A(10.24...)A(-1 | 0.24...) and B(10.24...)B(1| 0.24...).

    To be sure that these are inflection points, we should also calculate the third derivative and check that f(1)0f^{\prime\prime\prime}(1)\neq 0 and f(1)0f^{\prime\prime\prime}(-1)\neq 0. This is left to the reader.

  3. We have

    xf0,1(x)dx=x12πex2/2dx=F()F()=0\begin{array}{lll} \int_{-\infty}^{\infty} x \cdot f_{0,1}(x)\, dx &=& \int_{-\infty}^{\infty} x\cdot \frac{1}{\sqrt{2\pi}} e^{-x^2/2}\, dx\\ &=& F(\infty)-F(-\infty)\\ &=& 0 \end{array}

    where F(x)=12πex2/2F(x)=-\frac{1}{\sqrt{2\pi}} e^{-x^2/2} is the anti-derivative of x12πex2/2x\cdot \frac{1}{\sqrt{2\pi}} e^{-x^2/2}.

    So the mean is indeed 00.

The general case

For the general case the properties are similar. Here they are (without proof):

Theorem 1

Consider a normally distributed random variable XX with mean μ\mu and standard deviation σ\sigma. We have the following properties of fμ,σf_{\mu,\sigma} (see figure below):

  • the peak PP is at μ\mu, and its height is

    y=fμ,σ(μ)0.4σy=f_{\mu,\sigma}(\mu)\approx \frac{0.4}{\sigma}
  • the inflection points AA and BB are one σ\sigma away from the mean, and have height

    y=fμ,σ(μ±σ)0.24σy=f_{\mu,\sigma}(\mu\pm\sigma)\approx\frac{0.24}{\sigma}
  • the width is 2σ2\sigma

  • The points UU and VV, which are 2σ2\sigma away from μ\mu, have height

    y=fμ,σ(μ±2σ)0.05σy=f_{\mu,\sigma}(\mu\pm 2\sigma) \approx \frac{0.05}{\sigma}
  • xfμ,σ(x)dx=μ\int_{-\infty}^{\infty} x \cdot f_{\mu,\sigma}(x)\, dx = \mu (the mean of XX is μ\mu)

  • (xμ)2fμ,σ(x)dx=σ\sqrt{\int_{-\infty}^{\infty} (x-\mu)^2 \cdot f_{\mu,\sigma}(x)\, dx} = \sigma (the standard deviation of XX is σ\sigma)

So note that μ\mu tells you where peak of the graph of fμ,σf_{\mu,\sigma} is along the xx-axis, and σ\sigma tells you how wide the graph is. The larger σ\sigma is, the flatter will be the graph. This makes sense, as the total area under the graph must be 11.

Exercise 2

A random variable XX with mean 22 and standard deviation 0.50.5 is normally distributed. Sketch the graph of the density function by first indicating the maximum point PP, the inflection points AA and BB, and the height of the graph 2σ2\sigma away from μ\mu.

To verify the sketch, plot the density function with the calculator (or Geogebra).

Solution

As μ=2\mu=2 and σ=0.5\sigma=0.5, we have to draw the graph of the function f2,0.5f_{2,0.5}.

The coordinates of the points are

P(20.8),A(1.50.48),B(2.50.48),U(10.1),V(30.1)P(2\vert 0.8), A(1.5\vert 0.48), B(2.5\vert 0.48), U(1\vert 0.1), V(3\vert 0.1)