Probability calculations with the normal distribution

Consider a random variable XX with mean μ\mu and standard deviation σ\sigma which is normally distributed. Thus, the probability density function of XX is fμ,σf_{\mu,\sigma}. Recall that the probability for XX to have a value in a given interval [a,b][a,b] is the area under the probability density function

p(X[a,b])=abfμ,σ(x)dxp(X\in [a,b])=\int_a^b f_{\mu,\sigma}(x)\, dx

For example, if the mean is μ=0\mu=0 and the standard deviation is σ=1\sigma=1, the probability for XX to be between 1-1 and 11 is

p(X[1,1])=11f0,1(x)dx=1112πe12x2dx\begin{array}{lll} p(X\in [-1,1])&=&\int_{-1}^1 f_{0,1}(x)\, dx\\ &=& \int_{-1}^1 \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}\, dx\\ \end{array}

Now, to actually calculate the probability, we have to find an antiderivative FF of the function

12πe12x2\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}

and with the fundamental theorem of calculus we then have

p(X[1,1])=1112πe12x2dx=F(1)F(1)p(X\in [-1,1]) = \int_{-1}^1 \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}x^2}\, dx = F(1)-F(-1)

Unfortunately, the antiderivative FF cannot be expressed using any combination of the elementary functions like xn,sin(x),ex,log(x)x^n, \sin(x), e^x, \log(x) and so on. So we have to find the integral numerically using the calculator. (When calculators were still rare, people used large tables which listed the area under the curve for many different intervals.)

Exercise 1

Use the calculator to determine the probability numerically by using the integral key on your calculator.

p(X[1,1])=1112πe12x2dxp(X\in [-1,1]) = \int_{-1}^1 \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}x^2}\, dx

Hint: Later we will see that we can use the calculator's normcdf.

Solution

p(X[1,1])=0.68...p(X\in [-1,1]) = \underline{0.68...}

There are some areas under the curve of fμ,σf_{\mu,\sigma} which are really useful to know. In particular because they occur a lot in statistic applications. Here are some of these areas:

Theorem 1

The area under fμ,σf_{\mu,\sigma} between

  1. μσ\mu-\sigma and μ+σ\mu+\sigma is 0.6830.683
  2. μ2σ\mu-2\sigma and μ+2σ\mu+2\sigma is 0.9540.954
  3. μ3σ\mu-3\sigma and μ+3σ\mu+3\sigma is 0.9970.997

Note that these areas are independent of the values of μ\mu and σ\sigma!

And probably even more useful are the following areas:

Theorem 2

The area under fμ,σf_{\mu,\sigma} between

  1. μ1.64σ\mu-1.64\sigma and μ+1.64σ\mu+1.64\sigma is 0.90.9
  2. μ1.96σ\mu-1.96\sigma and μ+1.96σ\mu+1.96\sigma is 0.950.95
  3. μ2.58σ\mu-2.58\sigma and μ+2.58σ\mu+2.58\sigma is 0.990.99

(see figure below). Note again that these areas are independent of the values of μ\mu and σ\sigma!

Exercise 2
  1. Use the calculator to verify the areas in theorem 1 for μ=1\mu=1 and σ=2\sigma=2 (just pick one or two areas, you do not have to verify it for all areas).

  2. Consider a random variable XX with mean μ\mu and standard deviation σ\sigma which is normally distributed. Determine the following probabilities:

    1. p(X[μσ,μ+σ])p(X\in [\mu-\sigma, \mu+\sigma])
    2. p(Xμ)p(X\leq \mu)
    3. p(μ1.64σXμ)p(\mu-1.64\sigma \leq X \leq \mu)
    4. p(Xμ1.64σ)p(X\leq \mu-1.64\sigma)
    5. p(Xμ+1.64σ)p(X\leq \mu+1.64\sigma)
    6. p(Xμ+1.96σ)p(X\geq \mu+1.96\sigma)
    7. p(μ2.58σXμ1.64σ)p(\mu-2.58\sigma \leq X \leq \mu-1.64\sigma)
  3. The random variable XX has mean 33 and standard deviation 0.60.6. You perform the experiment 1000010 000 times. How many of the outputs of XX are (roughly) between 1.81.8 and 4.24.2?

Solution
  1. E.g. let's verify that

    p(X[μ2σ,μ+2σ])=0.954p(X\in [\mu-2\sigma,\mu+2\sigma])=0.954

    Because μ=1\mu=1 and σ=2\sigma=2, we have to show that

    p(X[3,5])=0.954p(X\in [-3,5])=0.954

    So let's calculate the integral

    p(X[3,5])=35f1,2(x)dxp(X\in [-3,5])=\int_{-3}^5 f_{1,2}(x)\, dx

    Using the calculator, we get indeed 0.95450.9545.

  2. We try to express the areas using the six areas above, together with the fact that the total area under the curve is 11.

    1. p(X[μσ,μ+σ])=0.683p(X\in [\mu-\sigma, \mu+\sigma])=\underline{0.683}
    2. p(Xμ)=0.5p(X\leq \mu)=\underline{0.5} (half of the total area of 11)
    3. p(μ1.64σXμ)=0.45p(\mu-1.64\sigma \leq X \leq \mu)=\underline{0.45} (half of the total area between μ1.64σ\mu-1.64\sigma and μ+1.64σ\mu+1.64\sigma, which is 0.90.9)
    4. p(Xμ1.64σ)=0.50.45=0.05p(X\leq \mu-1.64\sigma)=0.5-0.45=\underline{0.05}
    5. p(Xμ+1.64σ)=0.5+0.45=0.95p(X\leq \mu+1.64\sigma)=0.5+0.45=\underline{0.95}
    6. p(Xμ+1.96σ)=10.952=0.025p(X\geq \mu+1.96\sigma)=\frac{1-0.95}{2}=\underline{0.025}
    7. p(μ2.58σXμ1.64σ)=0.990.92=0.045p(\mu-2.58\sigma \leq X \leq \mu-1.64\sigma)=\frac{0.99-0.9}{2}=\underline{0.045}
  3. p(X[1.8,4.2])=p(X[μ2σ,μ+2σ])=0.954p(X\in[1.8,4.2])=p(X\in [\mu-2\sigma,\mu+2\sigma])=\underline{0.954}, thus about 0.95410000=95400.954\cdot 10\,000 =9540 outputs of XX.

Exercise 3
Q1

A machine produces screws of length 60mm60mm. But production is not perfect, and the length my vary. To find out more about it, the length of 10001000 screws are measured. The frequency table shows the following:

binFreq56.557.51857.558.57258.559.519659.560.539860.561.521761.562.58962.563.510\begin{array}{l|l} bin & Freq \\\hline 56.5-57.5 & 18\\ 57.5-58.5 & 72\\ 58.5-59.5 & 196\\ 59.5-60.5 & 398\\ 60.5-61.5 & 217\\ 61.5-62.5 & 89\\ 62.5-63.5 & 10\\ \end{array}

Also calculated from the 10001000 screws is the mean length, 60.31mm60.31mm, and the standard deviation, 1.14mm1.14mm.

  1. Sketch the histogram of screw lengths based on the table above.
  2. Check if the screw lengths are approximately normally distributed by drawing the graph of the probability distribution function f60.31,1.14f_{60.31,1.14} into the same coordinate system as the histogram. Use the 55 points that were discussed in the previous section. What do you think, are the screws normally distributed?
  3. Based on the model f60.31,1.14f_{60.31,1.14}, determine the probability that the screw length deviates by less than 2σ2\sigma from the mean.
Q2

The body temperature of a healthy adult is approximately normally distributed with mean 37C37^\circ C and a standard deviation of 0.4C0.4^\circ C.

  1. Describe the underlying experiment and the random variable XX.
  2. Determine the probability, that the temperature deviates by more than 0.8C0.8^\circ C from the mean.
  3. Determine the probability that the temperature is smaller than 36.6C36.6^\circ C.
  4. What minimum temperate do the warmest 5%5\% of the people have?
Q3

The probability density function ff of a normally distributed random variable XX has inflection points at x=7x=7 and x=11x=11. Determine

  1. the function equation of ff
  2. the probability p(6.1<X<10.3)p(6.1<X<10.3) (use the calculator and integrate numerically)
  3. the probability p(X>7.5)p(X>7.5) (use the calculator and integrate numerically)
Q4

Below is the frequency table of a data set. The mean of the data is m=29.6m=29.6, the standard deviation s=7.6s=7.6.

  1. Show that the data are approximately normally distributed by making the histogram and plotting the normal distribution with parameters μ=29.6\mu=29.6 and σ=7.6\sigma=7.6 as well.
  2. Based on the normal distribution, determine the (approximate) probability that a randomly chosen data point lies between 2222 and 37.237.2.
  3. Based on the normal distribution, determine an (approximate) interval [a,b][a,b] such that a randomly chosen data point lies in this interval with probability 0.950.95.
frequency02.502.55057.527.51001012.5412.51561517.5817.520112022.51722.525312527.53127.530463032.54432.535303537.52737.540234042.5642.54594547.5247.55035052.5152.55505557.5057.5600\begin{array}{rcl|l} && & \text{frequency} \\\hline 0&-&2.5 & 0\\ 2.5&-&5 & 0\\ 5&-&7.5 &2\\ 7.5&-&10 &0\\ 10&-&12.5 &4\\ 12.5&-&15 &6\\ 15&-&17.5 &8\\ 17.5&-&20 &11\\ 20&-&22.5 &17\\ 22.5&-&25 &31\\ 25&-&27.5 &31\\ 27.5&-&30 &46\\ 30&-&32.5 &44\\ 32.5&-&35 &30\\ 35&-&37.5 &27\\ 37.5&-&40 &23\\ 40&-&42.5 &6\\ 42.5&-&45 &9\\ 45&-&47.5 &2\\ 47.5&-&50 &3\\ 50&-&52.5 &1\\ 52.5&-&55 &0\\ 55&-&57.5 &0\\ 57.5&-&60 &0\\ \end{array}
Q5

Measurements of the weight of 1000010\,000 melons give a mean of m=3.24kgm=3.24 kg and a standard deviation of s=0.55kgs=0.55 kg. The histogram of the weights shows that the weights are approximately normally distributed.

  1. Approximately how many melons have a weight greater than 3.79kg3.79 kg?

  2. Between which weights aa and bb are about 95%95\% of the melons?

Solution
A1
  1. see below

  2. see below

  3. As the data XX (the screw length) is approximately normal distributed (see figure above, in (a)), the probability that the screw length is between μ2σ\mu-2\sigma and μ+2σ\mu+2\sigma is 0.954\underline{0.954} (see one of the six areas shown at the top). Of course we could also determine the integral using the calculator, and because of μ2σ=58.03\mu-2\sigma = 58.03 and μ+2σ=62.59\mu+2\sigma=62.59 we would find the same number:

    58.0362.59f60.31,1.14(x)dx=0.954\int_{58.03}^{62.59} f_{60.31,1.14}(x)\,dx=0.954
A2
  1. Random experiment is "select at random a healthy adult", and the random variable is XX="measure body temperature".

  2. Because 2σ=0.82\sigma=0.8, the probability is

    p(μ2σXμ+2σ)=0.954p(\mu-2\sigma \leq X\leq \mu+2\sigma)=0.954

    and therefore the probability that XX is outside this range is

    10.954=0.0461-0.954=\underline{0.046}
  3. p(X36.6)=p(Xμσ)=0.50.6832=0.1585p(X\leq 36.6)=p(X\leq \mu-\sigma) = 0.5-\frac{0.683}{2}=\underline{0.1585}.

  4. The warmest 5%5\% of the people are in the right tail under the curve (see figure below), which corresponds to a minimum temperature of μ+1.64σ=37.656C\mu+1.64\sigma=\underline{37.656^\circ C}.

A3
  1. Find μ\mu and σ\sigma. Because μ\mu is in the middle between the xx-coordinates of the inflection points, we get μ=9\mu=9, and because the inflection points are 1σ1\sigma away from μ\mu, it is σ=2\sigma=2. Thus,

    f9,2=122πe12(x92)2f_{9,2}=\frac{1}{2\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-9}{2})^2}
  2. p(6.1<X<10.3)=6.110.3f9,2(x)dxp(6.1<X<10.3)=\int_{6.1}^{10.3} f_{9,2}(x)\, dx and using the calculator, we get 0.6687\underline{0.6687}

  3. Note that we cannot insert \infty into the calculator, so we divide the area as follows (see figure below):

    p(X>7.5)=7.5f9,2(x)dx=7.59f9,2(x)dx+0.5=0.2734+0.5=0.7734\begin{array}{lll} p(X>7.5)&=&\int_{7.5}^\infty f_{9,2}(x)\,dx \\ &=&\int_{7.5}^{9} f_{9,2}(x)\,dx+0.5\\ &=& 0.2734+0.5\\ &=&\underline{0.7734} \end{array}
A4
  1. To get the densities for the histogram, we have to divide the frequencies by nn (relative frequency), and then also by the class width Δx\Delta x. If we add up the frequencies, we get n=301n=301, and the class width is Δx=2.5\Delta x=2.5. So we get the densities

    frequencydensity02.5002.550057.520.002657817.510001012.540.0053156112.51560.007973421517.580.010631217.520110.01461792022.5170.022591422.525310.0411962527.5310.04119627.530460.06112963032.5440.058471832.535300.03986713537.5270.035880437.540230.03056484042.560.0079734242.54590.01196014547.520.0026578147.55030.003986715052.510.001328952.555005557.50057.56000\begin{array}{rcl|c|l} && & \text{frequency} & \text{density} & \\\hline 0&-&2.5 & 0 & 0\\ 2.5&-&5 & 0 & 0\\ 5&-&7.5 &2 & 0.00265781\\ 7.5&-&10 &0 & 0\\ 10&-&12.5 &4 & 0.00531561\\ 12.5&-&15 &6 & 0.00797342\\ 15&-&17.5 &8 & 0.0106312\\ 17.5&-&20 &11 & 0.0146179\\ 20&-&22.5 &17 & 0.0225914\\ 22.5&-&25 &31 & 0.041196\\ 25&-&27.5 &31 & 0.041196\\ 27.5&-&30 &46 & 0.0611296\\ 30&-&32.5 &44 & 0.0584718\\ 32.5&-&35 &30 & 0.0398671\\ 35&-&37.5 &27 & 0.0358804\\ 37.5&-&40 &23 & 0.0305648\\ 40&-&42.5 &6 & 0.00797342\\ 42.5&-&45 &9 & 0.0119601\\ 45&-&47.5 &2 & 0.00265781\\ 47.5&-&50 &3 & 0.00398671\\ 50&-&52.5 &1 & 0.0013289\\ 52.5&-&55 &0 & 0\\ 55&-&57.5 &0 & 0\\ 57.5&-&60 &0 & 0\\ \end{array}

    The histogram and normal distribution are shown below. For the normal distribution, calculate the 55 points at x=μ±2σ,x=μ±σx=\mu \pm 2\sigma, x=\mu \pm\sigma and x=μx=\mu as usual. We get the points C(14.40.006),D(44.80.006),A(220.031),B(37.20.031),P(29.60.052)C(14.4 | 0.006), D(44.8 | 0.006 ), A(22|0.031 ), B(37.2 | 0.031), P(29.6|0.052 ).

  2. It is 22=μσ22=\mu-\sigma and 37.2=μ+σ37.2=\mu+\sigma. The probability that a data point lies in this interval is the area under the curve of μσ\mu-\sigma andμ+σ\mu+\sigma, and this is p=0.683p=\underline{0.683} (see previous chapter). Of course, we could also use the calculator to work out the integral 2237.5f29.6,7.6(x)dx\int_{22}^{37.5} f_{29.6,7.6}(x)\, dx.

  3. It is a=μ1.96σ=29.61.967.6=14.7a=\mu-1.96\sigma=29.6-1.96\cdot 7.6=14.7 and b=μ+1.96σ=29.6+1.967.6=44.5b=\mu+1.96\sigma=29.6+1.96\cdot 7.6=\underline{44.5}

A5
  1. Since the data are normally distributed with mean μ=3.24\mu=3.24 and standard deviation σ=0.55\sigma=0.55, the norma distribution that approximates the histogram is given by f3.24,0.55f_{3.24,0.55}.

    The probability that a melon is larger than 3.793.79 is the area under the curve of f3.24,0.55f_{3.24,0.55} according to \infty. Because 3.79=μ+σ3.79=\mu+\sigma, this area is just (10.683)/2=0.16(1-0.683)/2=0.16. So, since there are 1000010\,000 melons, about 0.1610000=16000.16\cdot 10\, 000=\underline{1600} melons are heavier than 3.79kg3.79 kg.

  2. a=μ1.96σ=2.162kg,b=μ+1.96σ=4.31kga=\mu-1.96\sigma=\underline{2.162 kg}, b=\mu+1.96\sigma = \underline{4.31 kg}