18 Probability in R

We will keep short here. Instead of learning all the concepts of probability, we will see how to calculate probability, densities, quantiles for nearly any type of distribution. R’s powerhorse has four types of functions for each of the distributions associated called pqdr functions. Actually all these are prefixes. Consider a probability function \(P(X=x) = p\) for a variable \(x\) and \(p\) be the associated probability.

Distribution P Q D R
Beta pbeta qbeta dbeta rbeta
Binomial pbinom qbinom dbinom rbinom
Cauchy pcauchy qcauchy dcauchy rcauchy
Chi-Square pchisq qchisq dchisq rchisq
Exponential pexp qexp dexp rexp
F pf qf df rf
Gamma pgamma qgamma dgamma rgamma
Geometric pgeom qgeom dgeom rgeom
Hypergeometric phyper qhyper dhyper rhyper
Logistic plogis qlogis dlogis rlogis
Log Normal plnorm qlnorm dlnorm rlnorm
Negative Binomial pnbinom qnbinom dnbinom rnbinom
Normal pnorm qnorm dnorm rnorm
Poisson ppois qpois dpois rpois
Student t pt qt dt rt
Studentized Range ptukey qtukey dtukey rtukey
Uniform punif qunif dunif runif
Weibull pweibull qweibull dweibull rweibull
Wilcoxon Rank Sum Statistic pwilcox qwilcox dwilcox rwilcox
Wilcoxon Signed Rank Statistic psignrank qsignrank dsignrank rsignrank

All these functions are vectorised. Let us explore these one by one.

18.1 p*() set of functions

These set of functions give the cumulative probability distribution of that probability function.

Example-1. What is the probability of a number being less than or equal to 25 in Normal distribution with mean = 50 and sd = 10.

pnorm(25, mean = 50, sd = 10)
## [1] 0.006209665

On the contrary, the probability of a number being greater than or equal to 25 in the above distribution is-

# Either deduct probability from 1 
1 - pnorm(25, mean = 50, sd = 10)
## [1] 0.9937903
# Or provide FALSE to lower.tail argument
pnorm(25, mean = 50, sd = 10, lower.tail = FALSE)
## [1] 0.9937903

Example-2: What is the probability of one or more heads out of two tosses of a fair coin (binomial distribution with p = 0.5).

pbinom(1, size = 2, p = 0.5)
## [1] 0.75

18.2 q*() set of functions

These set of functions, give quantile which is the inverse of cumulative probability function. So if \(f\) is cdf (cumulative distribution function) of a given probability distribution then \(F\) the quantile is inverse of f i.e. \(F = f^{-1}\). These are related by

\[\begin{equation} p = f(x) \tag{18.1} \end{equation}\]

\[\begin{equation} x = F(x) = f^{-1}(x) \tag{18.2} \end{equation}\]

Example- In the above same normal distribution (mean = 50 and sd = 10) What is number below which 90% of population will be distributed.

qnorm(0.9, mean = 50, sd = 10)
## [1] 62.81552

Similar to cdf here we may use lower.tail argument to find the number above which a population percent is distributed.

qnorm(0.9, mean = 50, sd = 10, lower.tail = FALSE)
## [1] 37.18448

18.3 d*() set of functions

We saw that p group denotes cdf, q group denotes inverse cdf, but d group actually denotes probability density function of a given distribution. Simply stating, this returns the height of probability distribution function for a given x value.

So what is expected probability of drawing exactly 2 heads out of two tosses of a single fair coin (i.e. from a binomial distribution with probability p = 0.5).

dbinom(2, 2, prob = 0.5)
## [1] 0.25

18.4 r*() set of functions

These set of functions are used to generate random numbers from a Statistical distribution. So to generate 10 random numbers from Normal distribution with mean = 50 and sd = 10, we can use rnorm.

rnorm(10, mean = 50, sd = 10)
##  [1] 33.10444 62.39496 48.91034 48.82758 51.83083 62.80555 32.72729 66.90184
##  [9] 55.03812 75.28337

We can actually check this using histogram.

set.seed(1234)
hist(rnorm(10000, 50, 10), breaks = 50)
Histogram of Random numbers generated out of Normal distribution

Figure 18.1: Histogram of Random numbers generated out of Normal distribution