12 Using R to compute probabilities
For most probability distributions, R has 4 built-in functions that tell you almost everything you will ever want to know about them.
For the Binomial distribution, these functions are the following:
dbinom(x)
: Probability mass functionpbinom(x)
: Cumulative distribution functionqbinom(p)
: quantile functionrbinom(n)
: function for random samples
For the Normal distributions, these functions are the following:
dnorm(x)
: Probability density functionpnorm(x)
: Cumulative distribution functionqnorm(p)
: quantile functionrnorm(n)
: function for random samples
You can see the naming convention adopted by R right away.
The
d
functions are mass or density functions.For discrete distributions, these functions return \(P(X=x)\). E.g.,
dbinom(x, n, p)
returns the probability that a binomial random variable with parameters \(n\) and \(p\) will yield a value of \(x\).For continuous distributions, these functions return a probability density. To get probability, we must consider a range of outcomes \([a, b]\) and compute the area under the curve. Computing the exact area under the curve requires evaluating an integral, which is too hard for us and awkward to do using a
d
function. It will be best to use ap
function in this case (see below).
The
p
functions are cumulative probability functions.- By default these functions return \(P(X \leq x)\). If you
specify
lower.tail=FALSE
then these functions return \(P(X\>x)\). Be careful when using these functions to appreciate that for continuous distributions \(P(X \leq x)=1-P(X \geq x)\) but for discrete distributions \(P(X \leq x)=1-P(X \geq x+1)\). All that is basically just to say be careful when considering whether to use greater than or greather than and equal to etc.
- By default these functions return \(P(X \leq x)\). If you
specify
The
q
functions are quantile functions.- They are the inverse of the cumulative probability functions. Here, you specify a cumulative probability \(q\), and the function returns the value of \(x\) such that \(P(X\<x)=q\).
The
r
functions generate random samples.
pmf: probabilities are given by values on the y-axis.
cdf: cumulative probabilities \((X\<x)\) are given by reading values on the y-axis.
qf: use this function to specify a probability (x-axis) and get the value that satisfies this probability from the y-axis.
pdf: probabilities are given by the area under the curve.
cdf: cumulative probabilities \((X\<x)\) are given by reading values on the y-axis.
qf: use this function to specify a probability (x-axis) and get the value that satisfies this probability from the y-axis.
In general, I think it is important for you to be able to read each of the types of plots above, so please really try to encode these and think about them deeply.