The probability function of RVs

We start with the definition of the probability distribution.

Definition 1

Consider a random variable XX of a random experiment, where XX can take on the possible (output) values x1,...,xrx_1, ..., x_r. The set of probabilities

p(X=x1),...,p(X=xr)p(X=x_1), ..., p(X=x_r)

is called the probability function of XX. This because we can think of fX(xi)=p(X=xi)f_X(x_i)= p(X=x_i) as a function with input values x1,...,xrx_1, ...,x_r and output values p(X=x1),...p(X=xr)p(X=x_1), ... p(X=x_r).

The function

FX(x)=p(Xx)F_X(x)=p(X\leq x)

where the input xx is a real number is called the cumulative distribution function of XX. FX(x)F_X(x) is the probability that the random variable XX takes on a value of xx or smaller. That is, it is the probability, that an output of the experiment has the value x\leq x.

Warning

Some books refer to the probability function as a probability distribution or probability density function, and to the cumulative disitribution function as a distributuion function. So stay flexible ... .

We often draw the probability function in a coordinate system, where the values x1,...,xrx_1, ..., x_r are indicated along the xx-axis, and the probabilities p(X=x1),...,p(X=xr)p(X=x_1), ..., p(X=x_r) along the yy-axis. For the cumulative distribution function we draw, as always for functions, the input xx along the xx-axis and the output p(X=x)p(X=x) along the yy-axis. Here is an example:

Exercise 1

A fair coin is flipped twice. The random variable is NN="number of heads".

  1. Determine the probability function of NN, and draw the function in a coordinate system.

  2. Draw the cumulative distribution function FNF_N.

Solution

The possible values of NN are {0,1,2}\{0,1,2\}, where

N=0={TT}N=1={TH,HT}N=2={HH}\begin{array}{lll} N=0 &=& \{TT\}\\ N=1&=&\{TH,HT\}\\ N=2&=&\{HH\} \end{array}

As this is a Laplace experiment, we have the following probability function of NN:

p(N=0)=14p(N=1)=24p(N=2)=14\begin{array}{lll} p(N=0) &=& \frac{1}{4}\\ p(N=1)&=&\frac{2}{4}\\ p(N=2)&=&\frac{1}{4} \end{array}

(see figure below, left).

The cumulative distribution function FN(x)F_N(x) is the probability that the number of heads is equal or less than xx. For example,

FN(0)=14FN(0.5)=14FN(1)=34FN(1.5)==34FN(2)=44FN(2.5)=44\begin{array}{lll} F_N(0) &=& \frac{1}{4}\\ F_N(0.5) &=& \frac{1}{4}\\ F_N(1)&=&\frac{3}{4}\\ F_N(1.5)=&=&\frac{3}{4}\\ F_N(2)&=&\frac{4}{4}\\ F_N(2.5)&=&\frac{4}{4}\\ \end{array}

and so on. It is a staircase function, where the jumps occur at the values x=0,1x=0, 1 and 22. See the figure below, right.

As the events X=x1,...X=xkX=x_1, ... X=x_k are pairwise mutually exclusive, and actually form a partition of the sample space SS, we have the following important properties:

Theorem 1

Consider the probability function p(X=x1),...,p(X=xr)p(X=x_1), ..., p(X=x_r) of a random variable XX. We have the following:

  1. For arbitrary values of XX, e.g. x1,x2x_1, x_2 and x3x_3 it is

    p(X=x1X=x2X=x3)=p(X=x1)+p(X=x2)+p(X=x3)p(X=x_1 \cup X=x_2 \cup X=x_3) = p(X=x_1)+p(X=x_2)+p(X=x_3)
  2. The sum of all probabilities of the probability function is 11:

    k=1rp(X=xk)=p(X=x1)+...+p(X=xr)=1\sum_{k=1}^r p(X=x_k)=p(X=x_1)+...+p(X=x_r)=1
  3. FX(x)F_X(x) is the sum of all probabilities p(X=xk)p(X=x_k) with xkxx_k\leq x. Thus, if for a given xx exactly the values x1,x2,x3xx_1, x_2, x_3\leq x, then

    FX(x)=p(X=x1)+p(X=x2)+p(X=x3)F_X(x)=p(X=x_1)+p(X=x_2)+p(X=x_3)
Proof

The proof is straight forward.

  1. This follows from the fact that the events are pairwise mutually exclusive.

  2. Follows from statement 1, and the fact that the union of all those events form the sample space SS, so we have

    1=p(S)=p(X=x1...X=xr)=p(X=x1)+...+p(X=xr)\begin{array}{lll} 1 &=& p(S)\\ &=& p(X=x_1\,\cup\, ... \,\cup\, X=x_r)\\ &=& p(X=x_1)+...+p(X=x_r)\\ \end{array}
  3. Follows from statement 11.

Exercise 2

A fair die is rolled twice. Consider the random variable SS="sum of the two numbers".

  1. Determine the possible values of SS.

  2. Determine and draw the probability function of SS.

  3. Determine FS(4)F_S(4)

  4. Draw the graph of FSF_S.

Solution

The sample space is

+123456123456723456783456789456789105678910116789101112\begin{array}{l|ccccccc} + & 1 & 2 & 3 & 4 & 5 & 6 \\\hline 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 3 & 4 & 5 & 6 & 7 & 8 & 9 \\ 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ 6 & 7 & 8 & 9 & 10 & 11 & 12 \\ \end{array}
  1. Possible outputs of SS: {2,3,4,...,11,12}\{ 2,3,4,..., 11, 12\}

  2. The probability function is (for a figure see below)

    p(S=2)=136p(S=3)=236p(S=4)=336p(S=5)=436p(S=6)=536p(S=7)=636p(S=8)=536p(S=9)=436p(S=10)=336p(S=11)=236p(S=12)=136\begin{array}{lll} p(S=2)&=&\frac{1}{36}\\ p(S=3)&=&\frac{2}{36}\\ p(S=4)&=&\frac{3}{36}\\ p(S=5)&=&\frac{4}{36}\\ p(S=6)&=&\frac{5}{36}\\ p(S=7)&=&\frac{6}{36}\\ p(S=8)&=&\frac{5}{36}\\ p(S=9)&=&\frac{4}{36}\\ p(S=10)&=&\frac{3}{36}\\ p(S=11)&=&\frac{2}{36}\\ p(S=12)&=&\frac{1}{36} \end{array}
  3. We have

    FS(4)=p(S4)=p(S=2)+p(S=3)+p(S=4)=636=16\begin{array}{lll} F_S(4)&=&p(S\leq 4)\\ &=&p(S=2)+p(S=3)+p(S=4)\\ &=&\frac{6}{36}=\frac{1}{6} \end{array}
  4. The graph of FF is shown below. It helps to calculate the points of the graph where it jumps:

    FS(2)=136FS(3)=336FS(4)=636FS(5)=1036FS(6)=1536FS(7)=2136FS(8)=2636FS(9)=3036FS(10)=3336FS(11)=3536FS(12)=3636=1\begin{array}{lll} F_S(2)&=&\frac{1}{36}\\ F_S(3) &=&\frac{3}{36}\\ F_S(4)&=&\frac{6}{36}\\ F_S(5)&=&\frac{10}{36}\\ F_S(6)&=&\frac{15}{36}\\ F_S(7)&=&\frac{21}{36}\\ F_S(8)&=&\frac{26}{36}\\ F_S(9)&=&\frac{30}{36}\\ F_S(10)&=&\frac{33}{36}\\ F_S(11)&=&\frac{35}{36}\\ F_S(12)&=&\frac{36}{36}=1\\ \end{array}
Exercise 3

The probabilities p(X=1)=x2p(X=1)=x^2, p(X=2)=3xp(X=2)=3x, p(X=3)=0.1p(X=3)=0.1 form the probability function of a random variable XX. Determine the value xx and the probabilities.

Solution

As p(X=1)+p(X=2)+p(X=3)=1p(X=1)+p(X=2)+p(X=3)=1, it follows

x2+3x+0.1=1x^2+3x+0.1=1

Solve for xx (midnight formula), we get x1=3.27x_1=-3.27 and x2=0.275x_2=0.275. As probabilities <0<0 are not possible, we have to exclude x1x_1 from the solutions. So p(X=1)=0.076p(X=1)=\underline{0.076}, and p(X=2)=0.824p(X=2)=\underline{0.824}.