18 Probability in R
We will keep short here. Instead of learning all the concepts of probability, we will see how to calculate probability, densities, quantiles for nearly any type of distribution. R’s powerhorse has four types of functions for each of the distributions associated called pqdr
functions. Actually all these are prefixes. Consider a probability function \(P(X=x) = p\) for a variable \(x\) and \(p\) be the associated probability.
Distribution | P | Q | D | R |
---|---|---|---|---|
Beta | pbeta | qbeta | dbeta | rbeta |
Binomial | pbinom | qbinom | dbinom | rbinom |
Cauchy | pcauchy | qcauchy | dcauchy | rcauchy |
Chi-Square | pchisq | qchisq | dchisq | rchisq |
Exponential | pexp | qexp | dexp | rexp |
F | pf | qf | df | rf |
Gamma | pgamma | qgamma | dgamma | rgamma |
Geometric | pgeom | qgeom | dgeom | rgeom |
Hypergeometric | phyper | qhyper | dhyper | rhyper |
Logistic | plogis | qlogis | dlogis | rlogis |
Log Normal | plnorm | qlnorm | dlnorm | rlnorm |
Negative Binomial | pnbinom | qnbinom | dnbinom | rnbinom |
Normal | pnorm | qnorm | dnorm | rnorm |
Poisson | ppois | qpois | dpois | rpois |
Student t | pt | qt | dt | rt |
Studentized Range | ptukey | qtukey | dtukey | rtukey |
Uniform | punif | qunif | dunif | runif |
Weibull | pweibull | qweibull | dweibull | rweibull |
Wilcoxon Rank Sum Statistic | pwilcox | qwilcox | dwilcox | rwilcox |
Wilcoxon Signed Rank Statistic | psignrank | qsignrank | dsignrank | rsignrank |
All these functions are vectorised. Let us explore these one by one.
18.1 p*()
set of functions
These set of functions give the cumulative probability distribution of that probability function.
Example-1. What is the probability of a number being less than or equal to 25
in Normal
distribution with mean = 50
and sd = 10
.
pnorm(25, mean = 50, sd = 10)
## [1] 0.006209665
On the contrary, the probability of a number being greater than or equal to 25 in the above distribution is-
# Either deduct probability from 1
1 - pnorm(25, mean = 50, sd = 10)
## [1] 0.9937903
# Or provide FALSE to lower.tail argument
pnorm(25, mean = 50, sd = 10, lower.tail = FALSE)
## [1] 0.9937903
Example-2: What is the probability of one or more heads out of two tosses of a fair coin (binomial distribution with p = 0.5
).
pbinom(1, size = 2, p = 0.5)
## [1] 0.75
18.2 q*()
set of functions
These set of functions, give quantile which is the inverse of cumulative probability function. So if \(f\) is cdf (cumulative distribution function) of a given probability distribution then \(F\) the quantile is inverse of f
i.e. \(F = f^{-1}\). These are related by
\[\begin{equation} p = f(x) \tag{18.1} \end{equation}\]
\[\begin{equation} x = F(x) = f^{-1}(x) \tag{18.2} \end{equation}\]
Example- In the above same normal distribution (mean = 50
and sd = 10
) What is number below which 90% of population will be distributed.
qnorm(0.9, mean = 50, sd = 10)
## [1] 62.81552
Similar to cdf
here we may use lower.tail
argument to find the number above which a population percent is distributed.
qnorm(0.9, mean = 50, sd = 10, lower.tail = FALSE)
## [1] 37.18448
18.3 d*()
set of functions
We saw that p
group denotes cdf
, q
group denotes inverse cdf
, but d
group actually denotes probability density function of a given distribution. Simply stating, this returns the height of probability distribution function for a given x value.
So what is expected probability of drawing exactly 2 heads out of two tosses of a single fair coin (i.e. from a binomial distribution with probability p = 0.5
).
dbinom(2, 2, prob = 0.5)
## [1] 0.25
18.4 r*()
set of functions
These set of functions are used to generate random numbers from a Statistical distribution. So to generate 10
random numbers from Normal distribution with mean = 50
and sd = 10
, we can use rnorm
.
rnorm(10, mean = 50, sd = 10)
## [1] 33.10444 62.39496 48.91034 48.82758 51.83083 62.80555 32.72729 66.90184
## [9] 55.03812 75.28337
We can actually check this using histogram.

Figure 18.1: Histogram of Random numbers generated out of Normal distribution