The probability density function

Recall that for a discrete random variable $X$ with the values $x_1,...,x_u$ , the list of probabilities

p(X=x_1), ..., p(X=x_u)

is called the probability function of $X$ . It tells you how the probabilities are distributed over the different values of $X$ .

We want to define an analogues function for a continuous random variable. Remeber that a continuous the random variable $X$ can take on any possible value in a given interval $I$ (or even on the whole line $I=]\infty,\infty[$ ). What exactly this interval is depends on the problem.

Example 1

Going back to the M&M-Example, the weight could be any value between $40g$ and $70g$ , such as $X=40.331$ or even $X=69.8213503104531390786574$ , but (because of the production process, perhaps) will never go lower than $40g$ and never exceed $70g$ . In this case we could choose the interval to be $I=[40,70]$ .

However, this poses two problems. First, how do we list the probabilties of a continuous random variable? We cannot form a list as in the discrete case above, because we cannot list all real numbers in an intervall (the real numbers in any interval are not countable).

Example 2

Back to M&Ms where the interval is $[40,70]$ . What follows after $40$ , is it $40.1$ or $40.01$ , or $40.001$ , or $40.000001$ ? I hope you see the problem.

A second problem is that it is $p(X=x)\approx 0$ for every value that $X$ can take on (that is, for every value $x \in I$ ). So for the M&Ms, it will be $p(X=40)\approx 0$ or $p(X=40.00001)\approx 0$ and so on. Why? It is simply unlikely that several M&Ms have exactly the same weight $40$ or $40.00001$ , meaning that the relative frequency is always very close to zero. So even if we could list in some way all the probabilities, there would be very little useful information in such a list.

To circumvent these problems, we have to take another approach to capture the probability distribution of a continuous random variable. The basic idea is to replace the probability function of a discrete random variable:

p(X=x_1), p(X=x_2), ..., p(X=x_u)

with small intervals for for continous variables:

p(X\in I_1), p(X\in I_2), ..., p(X\in I_v)

where the intervals $I_1, I_2, ..., I_v$ divide the interval $I$ . This is just the basic idea. It is not really workable in this version because we do not have a natural way to find the intervals $I_1, I_2, ..., I_v$ . How big or small do they have to be? How many of them should we choose? In fact we need to approach this problem from a different angle.

We start by defining a new type of function, the so called probability density function of $X$ .

Definition 1

Consider a continuous random variable $X$ , with values in the interval $I=[a,b]$ . The probability density function of $X$ , written $f_X$ , is a function with the following properties:

$f_X(x)\geq 0$ for all $x \in I$
$p(X\in [c,d])=\int_c^d f_X(x)\, dx\quad$ for every interval $[c,d]\subset I$ .

Note 1

$a=-\infty$ or $b=\infty$ is also possible, that means, intervals like $I=]-\infty,b]$ , $I=[a,\infty[$ , or $I=]-\infty,\infty[$ .

In other words, the graph of $f_X$ is never below the $x$ -axis, and the probability that $X$ takes on a value in the interval $[c,d]$ is the area beneath the graph of $f_X$ (from $c$ to $d$ ). See the figure below.

As $p(X\in I)=1$ (as $X$ cannot take on any other values, but will always produce a value), the following is valid:

Theorem 1

For a probability density function $f_X$ is

\int_a^b f_X(x)\, dx=1

That is, the total area beneath the graph of $f_X$ equals $1$ .

Every continuous random variable $X$ has such a probability density function (apart from some very strange exceptions). Now, the big question is, of course, how do we find $f_X$ for a given random variable $X$ . It turns out that the graph of $f_X$ approximately corresponds to the curve formed by the histogram of $X$ . To be more precise:

Theorem 2

To find the graph of the probability density function of a random variable $X$ :

create a huge number of datapoints drawn from $X$ (that is, we repeat the experiment a huge number of times and collect the values of $X$ such as the weight of M&Ms)
create a histogram of the datapoints with a really small bin size $\Delta x$

The graph of $f_X$ at any point $x$ is then formed by the bar height $d_x$ at $x$ :

f_X(x) \approx d_x

The more data points are used in the histogram and the smaller $\Delta x$ is chosen, the better is this approximation.

Proof

Please study the proof, as it helps to understand the issues a bit better.

First, consider $m$ data points from $X$ by repeating the experiment $m$ times and let's create a histogram of bin size $\Delta x$ (figure above left, blue bars). As we have seen in the last chapter, the area of bar $i$ is $d_i\cdot \Delta x$ , which is the relative frequency and thus approximately the probability that a data point lands in the bin $x_i,x_{i+1}.

\text{blue bar area} i = \approx p(X\in [x_i,x_{i+1}])

(the more data points we have, the better is this approximation). But from the definition of the probability density function we also know that the area under the curve of $f_X$ between $x_i$ and $x_{i+1}$ is exactly $p(X\in [x_i,x_{i+1}])$ (see figure above left, red area):

\text{red bar area} i = p(X\in [x_i,x_{i+1}])

Thus we have

\text{blue bar area} i \approx \text{red bar area} i

But because the width of both areas is the same, $\Delta x$ , we also find that the height of the areas is about the same:

\text{height blue bar } i \approx \text{height red area } i

But note that there are many heights in the red area, as it is curved on top. But if we choose $\Delta x$ really small, all heights will approximately be the same, namely $f_X(x)$ . Thus, we obtain the following result:

f_X(x_i) \approx d_i\quad (m\, \text{big}, \Delta x\, \text{small})

And this concludes the proof. Note that there is a more elegant but also a bit more abstract proof, which we quickly show below.

Alternative proof

Let $F_X$ be the antiderivative of $f_X$ , that is, $F^\prime_X=f_X$ . The bar area $i$ in the histogram is, as above

\begin{array}{ll} \Delta x \cdot d_i &\approx& p(X\in [x_i, x_{i+1}])\\ &=&\int_{x_i}^{x_{i+1}} f_X(x)\, dx\\ &=& F_X(x_{i+1})-F_X(x_i) \end{array}

where we used the fundamental theorem of calculus to bring the antiderivative into play. Now, because $x_{i+1}=x_i+\Delta x$ , we get

\Delta x\cdot d_i \approx F(x_i+\Delta x)-F(x_i)

and thus

d_i \approx \frac{F(x_i+\Delta x)-F(x_i)}{\Delta x} \approx F^\prime_X(x_i)=f_X(x_i)

Thus we see again that

f_X(x_i)\approx d_i

Have a look at the proof of this theorem, but also play a bit with the sliders in the geogebra applet below. Observe how the histogram gets smoother and approximates the probability density function for a large number of points $n$ and small bin size $\Delta x$ . Actually, we would need a lot more points $n$ and a much smaller bin size $\Delta x$ to get a really smooth histogram that overlaps exactly with $f_X$ , but you should get the idea.

Open in GeoGebra

Knowing the graph of $f_X$ does not necessarily mean we can find its function equation easily. To find such a function equation we often make an educated guess about $f_X$ and then verify our assumption by comparing the histogram with the graph of $f_X$ . But we will not do this here.

Exercise 1

Argue, why is $\int_{a}^{b} f_X(x)\, dx=1$ for every density probability distribution, where $X$ takes on values in the interval $I=[a,b]$ .
Consider a random experiment with a continuous random variable $X$ whose probability density function is
$f_X(x)=\begin{cases}\frac{3}{4}-\frac{3}{4}x^2 & x\in [-1,1] \\ 0 & x\not\in [-1,1] \end{cases}$
1. Draw the probability density function.
2. Determine the probability that the observed value of $X$ is between $0.4$ and $0.7$ , that is, determine the probability $p(X\in [0.4,0.7])$ .

Solution

$\int_{a}^{b} f_X(x)\, dx=p(X\in [a, b])=1$ .
$p(X\in [0.4,0.7])=\int_{0.4}^{0.7} f_X(x)\, dx$ . The antiderivative of $f$ is
1. The graph is
2. We have
  $\begin{array}{lll} F(x)&=&\frac{3}{4}x-\frac{3}{4}\frac{1}{3}x^3\\ &=&\frac{3}{4}x-\frac{1}{4}x^3 \end{array}$
  and therefore we have
  $\begin{array}{lll} p(X\in [0.4,0.7])&=&\int_{0.4}^{0.7} f_X(x)\, dx\\ &=&F(0.7)-F(0.4)\\ &=&\frac{3}{4}\cdot 0.7-\frac{1}{4}\cdot 0.7^3-(\frac{3}{4}\cdot 0.4-\frac{1}{4}\cdot 0.4^3)\\ &=& \underline{0.155} \end{array}$