4 Existing and useful functions in base R

R has a lot of inbuilt/existing functions that are useful and therefore it is good to know about them. Let us discuss a few of these existing functions which are useful for data analytics and other allied jobs.

Firstly, let’s learn logical operators that will be useful to check various conditions. For those who doesn’t know what operators are, they may simply think of operators being special kind of functions having exactly two arguments.

4.1 Conditions and logical operators/operands

Table 4.1: Conditions and logical operators/operands
Operator/ function	Meaning	Example
`==`	Is RHS equal to LHS?	`5 == 2` will return FALSE
		`'Anil' == 'anil'` is FALSE
`!=`	Is RHS not equal to LHS?	`'ABCD' != 'abcd'` is TRUE
`>=`	Is LHS greater than or equal to RHS?	`5 >= 2` will return TRUE
`<=`	Is LHS less than or equal to RHS?	`15 <= 2` will return FALSE
`>`	Is LHS strictly greater than RHS?	`2 > 2` will return FALSE
`<`	Is LHS strictly less than RHS?	`12 < 12` will return FALSE
`is.na()`	Whether the argument passed is NA	`is.na(NA)` is TRUE
`is.null()`	Whether the argument passed is null	`is.null(NA)` is FALSE
`\|`	Logical OR	`TRUE \| FALSE` will return `TRUE`
`&`	Logical AND	`TRUE & FALSE` will return `FALSE`
`!`	Logical NOT	`!TRUE` will return `FALSE`
`\|\|`	Element wise Logical OR	Examines only the first element of the operands resulting into a single length logical vector
`&&`	Element wise Logical AND	Examines only the first element of the operands resulting into a single length logical vector
`%in%`	LHS IN RHS	Checks whether LHS elements are present in RHS vector

Vectorisation of operations and functions

All the above mentioned operators are vectorised. Except || and && will return vector of same length as we are comparing. Check

LETTERS[1:4] == letters[1:4]

## [1] FALSE FALSE FALSE FALSE

10:1 >= 1:10

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

# TRUE will act as 1 and FALSE as 0
x <- c(TRUE, FALSE, FALSE, TRUE)
y <- c(1, 0, 1, 10)
x == y

## [1]  TRUE  TRUE FALSE FALSE

# Examples of element wise operations
x & y

## [1]  TRUE FALSE FALSE  TRUE

x | y

## [1]  TRUE FALSE  TRUE  TRUE

# character strings may be checked for alphabetic order
'ABCD' >= 'AACD'

## [1] TRUE

4.2 Recycling

Recycling rules apply when two vectors are not of equal length. See these examples.

# Notice that results are displayed silently
LETTERS[1:4] == 'A'

## [1]  TRUE FALSE FALSE FALSE

#Notice that results are displayed with a warning
LETTERS[1:5] == LETTERS[1:3]

## Warning in LETTERS[1:5] == LETTERS[1:3]: longer object length is not a multiple
## of shorter object length

## [1]  TRUE  TRUE  TRUE FALSE FALSE

The operator %in% behaves slightly different from above. Each searches each element of LHS in RHS and gives result in a logical vector equal to length of LHS vector. See these examples carefully.

'A' %in% LETTERS

## [1] TRUE

LETTERS %in% LETTERS[1:4]

##  [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE

4.3 Handling Missing values `NA` in these operations

While checking for any condition to be TRUE or FALSE missing values NA and/or NaN should be handled carefully or a bug may be introduced. See these examples-

FALSE != NA

## [1] NA

TRUE != NA

## [1] NA

Thus, if any of the condition is evaluated on a vector, we can have NA in our output along with TRUE and FALSE. See this example

x <- c(1, 5, 15, NA, 2, 3)
x <= 5

## [1]  TRUE  TRUE FALSE    NA  TRUE  TRUE

These missing values however behaves slightly different with logical operators & |. See these examples.

TRUE | NA

## [1] TRUE

FALSE & NA

## [1] FALSE

4.4 Use of above logical operators for subsetting

Since the logical operations on vectors gives a logical vector as output, these can be used for sub-setting as well. See these examples.

my_ages <- c(40, 45, 31, 51, 25, 27, 59, 45)
# filter ages greater than or equal to 30
my_ages[my_ages >= 30]

## [1] 40 45 31 51 59 45

my_names <- c("Andrew", "Bob", "Carl", "Daven", "Earl")
# filter names which start with alphabet either A, B or C
my_names[my_names <= "D"]

## [1] "Andrew" "Bob"    "Carl"

4.5 Conditions with `ifelse`

Syntax ifelse(test, yes, no) will be used to return value (of same shape as test) which is filled with elements selected from either yes or no depending on whether the elements of test are TRUE or FALSE. See this example

x <- c(1:5, NA, 16:20)
ifelse(x>5, 'Greater than 5', 'Upto 5')

##  [1] "Upto 5"         "Upto 5"         "Upto 5"         "Upto 5"        
##  [5] "Upto 5"         NA               "Greater than 5" "Greater than 5"
##  [9] "Greater than 5" "Greater than 5" "Greater than 5"

4.6 Functions `all()` and `any()`

These are shortcut functions to tell us whether all or any of the elements of given object are TRUE. See This example

x <- 11:20
all(x > 5)

## [1] TRUE

any(x > 20)

## [1] FALSE

All of the above mentioned operators (along with those listed in section 1.1) are vectorised. Check these examples.

x <- 1:5
y <- 6:10

x + y

## [1]  7  9 11 13 15

x - y

## [1] -5 -5 -5 -5 -5

x * y

## [1]  6 14 24 36 50

x / y

## [1] 0.1666667 0.2857143 0.3750000 0.4444444 0.5000000

x ^ y

## [1]       1     128    6561  262144 9765625

# Caution: here RHS is not a vector
y %% 3

## [1] 0 1 2 0 1

y %/% 3

## [1] 2 2 2 3 3

Recycling also applies on mathematical operators. See these examples and notice when R gives results silently and when with a warning.

10:15 + 4

## [1] 14 15 16 17 18 19

100:110 - 50

##  [1] 50 51 52 53 54 55 56 57 58 59 60

# when length of one vector is multiple of length of smaller vector
x <- c(5, 2, 7, 9)
y <- c(7, 8)
x + y

## [1] 12 10 14 17

# when length of one vector is not multiple of length of smaller vector
x + c(1, 2, 3)

## Warning in x + c(1, 2, 3): longer object length is not a multiple of shorter
## object length

## [1]  6  4 10 10

All the above-mentioned operators/functions may also be used on matrices, arrays of larger dimension, since we have already seen that matrices/arrays are actually vectors at the core.

mat1 <- matrix()

4.7 Common arithmetical Functions

Table 4.2: Common Arithmetical Functions
Function	Meaning	Input	Output
`sum()`	Adds all elements	One or more Vector, matrix, array	Vector having 1 element only
`prod()`	Returns product of all elements	One or more Vector, matrix, array	Vector having 1 element only
`mean()`	Returns the arithmetic mean	One Vector, matrix, array	Vector having 1 element only
`max()`	Returns maximum value	One or more Vector, matrix, array	Vector having 1 element only
`min()`	Returns minimum value	One or more Vector, matrix, array	Vector having 1 element only
`ceiling()`	Returns integer(s) not less than given values	One Vector, matrix, array	Vector, matrix, array having same `dim`
`floor()`	Returns largest integers not greater than given values	One Vector, matrix, array	Vector, matrix, array having same `dim`
`trunc()`	returns integers formed by truncating the values towards 0	One Vector, matrix, array	Vector, matrix, array having same `dim`
`round(x, digits = 0)`	Rounds the given value(s) to number of decimal places provided	One Vector, matrix, array	Vector, matrix, array having same `dim`
`signif(x, digits = 6)`	Round to `significant` digits	One Vector, matrix, array	Vector, matrix, array having same `dim`
`factorial()`	Returns factorial	One Vector, matrix, array of `integer` type	Vector having 1 element
`sqrt()`	Returns square root	One Vector, matrix, array	Vector, matrix, array having same `dim`
`log10()` or `log2()`	Logrithm with base 10 or 2 respectively	One Vector, matrix, array	Vector, matrix, array having same `dim`
`exp(x)`	returns exponential	One Vector, matrix, array	Vector, matrix, array having same `dim`

See these examples.

sum(1:100, 1:10)

## [1] 5105

Mat1 <- matrix(1:10, nrow = 2)
Mat2 <- matrix(1:4, nrow = 2)

prod(Mat1, Mat2)

## [1] 87091200

sqrt(Mat2)

##          [,1]     [,2]
## [1,] 1.000000 1.732051
## [2,] 1.414214 2.000000

log10(Mat1)

##         [,1]      [,2]      [,3]     [,4]      [,5]
## [1,] 0.00000 0.4771213 0.6989700 0.845098 0.9542425
## [2,] 0.30103 0.6020600 0.7781513 0.903090 1.0000000

factorial(10:1)

##  [1] 3628800  362880   40320    5040     720     120      24       6       2
## [10]       1

4.7.1 Missing values

If the vector on which we are calculating sum etc., has missing values, we will have to use argument na.rm = TRUE in these functions (Check documentation of these functions individually once). See these examples -

x <- c(1:50, NA)
sum(x)

## [1] NA

sum(x, na.rm = TRUE)

## [1] 1275

mean(x, na.rm = TRUE)

## [1] 25.5

4.8 Some Statistical functions

Table 4.3: Some commonly used Statistical Functions
Function	Meaning	Input	Output
`sd()`	Returns standard deviation	One Vector, matrix, array	Vector having 1 element only
`var()`	Returns variance	One or more Vector, matrix, array	Vector having 1 element only
`median()`	Returns median value	One Vector, matrix, array	Vector having 1 element only
`range()`	Returns range	One Vector, matrix, array	Vector having 2 elements
`IQR()`	Computes interquartile range of the x values	One Vector, matrix, array	Vector having 1 element only
`quantile()`	Computes percentile of given values for the given probabilities in `probs` argument	One Vector, matrix, array	Named Vector having 5 elements by default, OR equal to the length of `probs` vector given

Examples-

median(1:100)

## [1] 50.5

range(1:100, 45, 789)

## [1]   1 789

quantile(1:100)

##     0%    25%    50%    75%   100% 
##   1.00  25.75  50.50  75.25 100.00

quantile(0:100, probs = 1:10 / 10)

##  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
##   10   20   30   40   50   60   70   80   90  100

4.9 Functions related to sampling and probability distributions

4.9.1 Set the random seed with `set.seed()`

It is a way to specify the random seed which is an integer vector, containing the random number generator (RNG) state for random number generation in R. It does not given any output but makes your code reproducible for further use.

4.9.2 Generate random numbers with `rnorm()` / `runif()` / `rpois()` etc.

Used to generate random numbers from normal, uniform and poisson distributions respectively. Of course there are numerous other functions not only to calculate random numbers but to calculate probability, density of these and other probability distributions (such as binomial, t), but those are beyond the scope of this book. E.g.

rnorm(n=10) #default mean is 0 and SD is 1

##  [1]  1.8062714  0.4990112 -0.6961670 -0.4514996 -1.0704310  0.2688084
##  [7]  0.8687072  1.1770034 -0.3030270 -0.1455080

rnorm(n=10) # notice these will produce different results each time.

##  [1] -0.61196717 -0.14021692 -1.39770173  0.09571398  1.26362876  1.10585254
##  [7]  0.06406968  2.04684899 -1.02892626  0.30159577

# If however seed is fixed as above, these will be reproducible.
set.seed(123)
runif(10) # default min and max are 0 and 1 respectively

##  [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673 0.0455565 0.5281055
##  [8] 0.8924190 0.5514350 0.4566147

set.seed(123)
runif(10)

##  [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673 0.0455565 0.5281055
##  [8] 0.8924190 0.5514350 0.4566147

4.9.3 Random Sample with `sample()`

Used to take a sample of the specified size from the elements of x using either with or without replacement. E.g.

set.seed(123)
sample(LETTERS, 5, replace = FALSE)

## [1] "O" "S" "N" "C" "J"

set.seed(111)
sample(LETTERS, 15, replace = TRUE)

##  [1] "N" "T" "S" "O" "Y" "E" "C" "H" "Z" "Q" "M" "J" "D" "O" "H"

If the sampling is proportionate to given probabilities the same can be provided in prob argument.

set.seed(12)
sample(LETTERS, 5, replace = FALSE, prob = 1:26)

## [1] "Z" "K" "F" "V" "X"

4.10 Other Mathematical functions

4.10.1 Progressive calculations with `cumsum()` /`cumprod()`

Used to calculate running total or product. Output vector length will be equal to that of input vector.

cumsum(1:10)

##  [1]  1  3  6 10 15 21 28 36 45 55

cumprod(-5:5)

##  [1]   -5   20  -60  120 -120    0    0    0    0    0    0

Other similar functions like cummax() (cumulative maximum) and cummin() may also be useful.

set.seed(1)
x <- sample(1:100, 10)
cummin(x)

##  [1] 68 39  1  1  1  1  1  1  1  1

cummax(x)

##  [1] 68 68 68 68 87 87 87 87 87 87

4.10.2 Progressive difference `diff()`

Used to calculate running difference (difference between two consecutive elements) in the given numeric vector. Output will be shorter by one element. E.g.

set.seed(123)
x <- rnorm(10)
x

##  [1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499
##  [7]  0.46091621 -1.26506123 -0.68685285 -0.44566197

diff(x)

## [1]  0.33029816  1.78888580 -1.48819992  0.05877934  1.58577725 -1.25414878
## [7] -1.72597744  0.57820838  0.24119088

length(diff(x))

## [1] 9

4.11 String Manipulation functions

4.11.1 Concatenate strings with `paste()` and `paste0()`

R’s inbuilt function paste() concatenates each element of one or more vectors given as argument. Argument sep is used to provide separator is any, which by default is a space i.e. " ". On the other sep argument is not available in paste0 which thus concatenates elements without any separator.

paste(LETTERS, letters)

##  [1] "A a" "B b" "C c" "D d" "E e" "F f" "G g" "H h" "I i" "J j" "K k" "L l"
## [13] "M m" "N n" "O o" "P p" "Q q" "R r" "S s" "T t" "U u" "V v" "W w" "X x"
## [25] "Y y" "Z z"

paste0(letters, '_', 1:26) # check replication here

##  [1] "a_1"  "b_2"  "c_3"  "d_4"  "e_5"  "f_6"  "g_7"  "h_8"  "i_9"  "j_10"
## [11] "k_11" "l_12" "m_13" "n_14" "o_15" "p_16" "q_17" "r_18" "s_19" "t_20"
## [21] "u_21" "v_22" "w_23" "x_24" "y_25" "z_26"

Note: that both paste and paste0 returns vector with length equal to length of larger vector. Thus if the requirement is to concatenate each of the element in the given vector(s), use another argument collapse. See this example.

paste0(letters, 1:26, collapse = '+')

## [1] "a1+b2+c3+d4+e5+f6+g7+h8+i9+j10+k11+l12+m13+n14+o15+p16+q17+r18+s19+t20+u21+v22+w23+x24+y25+z26"

4.11.2 Functions `startsWith()` / `endsWith()`

To check whether the given string vector say x start or end with string (entries of) prefix or suffix we can use startsWith(x, prefix) or endsWith(x, suffix) respectively. E.g.

x <- c('apples', 'oranges', 'apples and oranges', 'oranges and apples', 'apricots')
startsWith(x, 'apples')

## [1]  TRUE FALSE  TRUE FALSE FALSE

startsWith(x, 'ap')

## [1]  TRUE FALSE  TRUE FALSE  TRUE

endsWith(x, 'oranges')

## [1] FALSE  TRUE  TRUE FALSE FALSE

Note that both these functions return logical vectors having same length as x.

4.11.3 Check number of characters in string vector using `nchar()`

To count the number of characters in each of the element in string vector, say x, we can use nchar(x) which will return a vector of integer types. E.g.

nchar(x)

## [1]  6  7 18 18  8

y <- c('', ' ', '   ', NA)
nchar(y)

## [1]  0  1  3 NA

4.11.4 Change case using `toupper()` / `tolower()`

Changes the case of given vector to all UPPER or lower case respectively. Example-

x <- c('Andrew', 'Bob')
tolower(x)

## [1] "andrew" "bob"

toupper(x)

## [1] "ANDREW" "BOB"

Extract a portion of string using `substr()`

To extract the characters from a given vector say x from a given start position to stop position (both being integers) we will use substr(x, start, stop). E.g.

substr(x, 2, 8)

## [1] "ndrew" "ob"

4.11.5 Split a character vector using `strsplit()`

To split the elements of a character vector x into sub-strings according to the matches to sub-string split within them. E.g.

strsplit(x, split = ' ')

## [[1]]
## [1] "Andrew"
## 
## [[2]]
## [1] "Bob"

Notice that output will be of list type.

4.11.6 Replace portions of string vectors `sub()` / `gsub()`

These two functions are used to perform replacement of the first and all matches respectively. E.g.

#Replace only first match
sub(pattern = 'B', replacement = '12', x, ignore.case = TRUE)

## [1] "Andrew" "12ob"

# Replace all matches
gsub(pattern = 'B', replacement = '12', x, ignore.case = TRUE)

## [1] "Andrew" "12o12"

4.11.7 Match patterns using `grep()` / `grepl()` / `regexpr()` / `gregexpr()`

These functions are used to match string passed as argument pattern under a string vector. These four however, differ in output/results. E.g.

grep(pattern = 'an', x) # will give indices.

## integer(0)

#                         Output will be integer vector and length may be shorter than that of `x`
grepl(pattern = 'an', x) # will give a logical vector of same length as `x`

## [1] FALSE FALSE

regexpr(pattern = 'an', x) # output will have multiple attributes

## [1] -1 -1
## attr(,"match.length")
## [1] -1 -1
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE

Note that regexpr() outputs the character position of first instance of pattern match within the elements of given vector. gregexpr() is same as regexpr() but finds all instances of pattern. Output will be in list format. E.g.

gregexpr(pattern = 'an', x)

## [[1]]
## [1] -1
## attr(,"match.length")
## [1] -1
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[2]]
## [1] -1
## attr(,"match.length")
## [1] -1
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE

4.12 Other functions

4.12.1 Transpose a matrix using `t()`

Used to return transpose of given matrix. E.g.

mat <- outer(1:5, 1:5, FUN = \(x, y) paste0('A', x, y))
mat

##      [,1]  [,2]  [,3]  [,4]  [,5] 
## [1,] "A11" "A12" "A13" "A14" "A15"
## [2,] "A21" "A22" "A23" "A24" "A25"
## [3,] "A31" "A32" "A33" "A34" "A35"
## [4,] "A41" "A42" "A43" "A44" "A45"
## [5,] "A51" "A52" "A53" "A54" "A55"

t(mat)

##      [,1]  [,2]  [,3]  [,4]  [,5] 
## [1,] "A11" "A21" "A31" "A41" "A51"
## [2,] "A12" "A22" "A32" "A42" "A52"
## [3,] "A13" "A23" "A33" "A43" "A53"
## [4,] "A14" "A24" "A34" "A44" "A54"
## [5,] "A15" "A25" "A35" "A45" "A55"

4.12.2 Generate a frequency table using `table()`

Returns a frequency/contingency table of the counts at each combination of factor levels. E.g.

set.seed(123)
x <- sample(LETTERS[1:5], 100, replace = TRUE)
table(x)

## x
##  A  B  C  D  E 
## 21 20 23 17 19

If more than one argument is passed-

set.seed(1234)
df <- data.frame(State_code = x,
                 Code2 = sample(LETTERS[11:15], 100, replace = TRUE))
my_table <- table(df$State_code, df$Code2)
my_table

##    
##     K L M N O
##   A 5 5 4 4 3
##   B 4 3 6 2 5
##   C 6 3 3 6 5
##   D 2 2 4 6 3
##   E 2 6 4 4 3

4.12.3 Generate proportion of frequencies using `prop.table()`

This function takes a table object as input and calculate the proportion of frequencies.

prop.table(my_table)

##    
##        K    L    M    N    O
##   A 0.05 0.05 0.04 0.04 0.03
##   B 0.04 0.03 0.06 0.02 0.05
##   C 0.06 0.03 0.03 0.06 0.05
##   D 0.02 0.02 0.04 0.06 0.03
##   E 0.02 0.06 0.04 0.04 0.03

4.12.4 Column-wise or Row-wise sums using `colSums()` / `rowSums()`

Used to sum rows/columns in a matrix/data.frame. E.g.

# Row sums
rowSums(my_table)

##  A  B  C  D  E 
## 21 20 23 17 19

# Col sums
colSums(my_table)

##  K  L  M  N  O 
## 19 19 21 22 19

Note Similar to colSums()/ rowSums() we also have colMeans() and rowMeans().

rowMeans(my_table)

##   A   B   C   D   E 
## 4.2 4.0 4.6 3.4 3.8

4.12.5 Extract unique values using `unique()`

Used to extract only unique values/elements from the given vector. E.g.

unique(x) # note the output

## [1] "C" "B" "E" "D" "A"

4.12.6 Check if two vectors are identical using `identical()`

Used to check whether two given vectors/objects are identical.

identical(unique(x), LETTERS)

## [1] FALSE

4.12.7 Retreive duplicate items in a vector using `duplicated()`

Used to check which elements have already appeared in the vector and are thus duplicate.

set.seed(123)
x <- sample(LETTERS[1:5], 8, replace = TRUE)
x

## [1] "C" "C" "B" "B" "C" "E" "D" "A"

duplicated(x)

## [1] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE

4.12.8 Generate sequences using other objects with `seq_len()` / `seq_along()`

Used to generate sequence of given integer length starting with 1, or with length equal to given vector, respectively. E.g.

seq_len(5)

## [1] 1 2 3 4 5

x <- c('Andrew', 'Bob')
seq_along(x)

## [1] 1 2

4.12.9 Divide a vector into categories (factor) using `cut()`

The function divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on. The output vector will be of type factor.

Example-1:

x <- c(1,2,3,4,5,2,3,4,5,6,7)
cut(x, 3)

##  [1] (0.994,3] (0.994,3] (0.994,3] (3,5]     (3,5]     (0.994,3] (0.994,3]
##  [8] (3,5]     (3,5]     (5,7.01]  (5,7.01] 
## Levels: (0.994,3] (3,5] (5,7.01]

Example-2:

cut(x, 3, dig.lab = 1, ordered_result = TRUE)

##  [1] (1,3] (1,3] (1,3] (3,5] (3,5] (1,3] (1,3] (3,5] (3,5] (5,7] (5,7]
## Levels: (1,3] < (3,5] < (5,7]

Note: that the output factor above is ordered.

4.12.10 Scale the columns of a matrix using `scale()`

Used to scale the columns of a numeric matrix.

x <- matrix(1:10, ncol = 2)
x

##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10

scale(x)

##            [,1]       [,2]
## [1,] -1.2649111 -1.2649111
## [2,] -0.6324555 -0.6324555
## [3,]  0.0000000  0.0000000
## [4,]  0.6324555  0.6324555
## [5,]  1.2649111  1.2649111
## attr(,"scaled:center")
## [1] 3 8
## attr(,"scaled:scale")
## [1] 1.581139 1.581139

Note: The output will always be of a matrix type with two more attributes. See this example

scale(1:5)

##            [,1]
## [1,] -1.2649111
## [2,] -0.6324555
## [3,]  0.0000000
## [4,]  0.6324555
## [5,]  1.2649111
## attr(,"scaled:center")
## [1] 3
## attr(,"scaled:scale")
## [1] 1.581139

4.12.11 Output the results using `cat()`

Outputs the objects, concatenating the representations. cat performs much less conversion than print.

cat('ABCD')

## ABCD

Note: that indices are now not printed. cat may print objects also. Example-2:

cat(month.name)

## January February March April May June July August September October November December

cat is useful to print special characters. Example-3:

cat('Budget Allocation is \u20b91.5 crore')

## Budget Allocation is ₹1.5 crore

4.12.12 Sort a vector using `sort()`

Used to sort the given vector. Example-1:

vec <- c(5, 8, 4, 1, 6)
sort(vec)

## [1] 1 4 5 6 8

Argumemt decreasing = TRUE is used to sort the vector in descending order instead of default ascending order. Example-2:

sort(vec, decreasing =  TRUE)

## [1] 8 6 5 4 1

4.12.13 Arrange the elements of a vector using `order()`

In contrast to sort() explained above, order() returns the indices of given vector in ascending order. Example

order(vec)

## [1] 4 3 1 5 2

Thus, sort(vec) will essentially perform the same operations as vec[order(vec)]. We may check-

identical(vec[order(vec)], sort(vec))

## [1] TRUE

4.12.14 Check structure using `str()`

The short str is not to be confused with strings as it instead is short for structure. Thus, str returns structure of given object. Example

str(vec)

##  num [1:5] 5 8 4 1 6

Extremely useful when we need to inspect data frames.

str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Generate a summary using `summary()`

In addition to str explained above, summary() is also useful is getting result summaries of given objects. Example-1: When given object is vector

summary(vec)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     4.0     5.0     4.8     6.0     8.0

We observe that when numeric vector is passed, it produces quantile summary. Example-2: When input object is data frame.

summary(iris)

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
##

3 Functions and operations in R

5 Pipes in R

4 Existing and useful functions in base R

4.1 Conditions and logical operators/operands

Vectorisation of operations and functions

4.2 Recycling

4.3 Handling Missing values NA in these operations

4.4 Use of above logical operators for subsetting

4.5 Conditions with ifelse

4.6 Functions all() and any()

4.7 Common arithmetical Functions

4.7.1 Missing values

4.8 Some Statistical functions

4.9 Functions related to sampling and probability distributions

4.9.1 Set the random seed with set.seed()

4.9.2 Generate random numbers with rnorm() / runif() / rpois() etc.

4.9.3 Random Sample with sample()

4.10 Other Mathematical functions

4.10.1 Progressive calculations with cumsum() /cumprod()

4.10.2 Progressive difference diff()

4.11 String Manipulation functions

4.11.1 Concatenate strings with paste() and paste0()

4.11.2 Functions startsWith() / endsWith()

4.11.3 Check number of characters in string vector using nchar()

4.11.4 Change case using toupper() / tolower()

Extract a portion of string using substr()

4.11.5 Split a character vector using strsplit()

4.11.6 Replace portions of string vectors sub() / gsub()

4.11.7 Match patterns using grep() / grepl() / regexpr() / gregexpr()

4.12 Other functions

4.12.1 Transpose a matrix using t()

4.12.2 Generate a frequency table using table()

4.12.3 Generate proportion of frequencies using prop.table()

4.12.4 Column-wise or Row-wise sums using colSums() / rowSums()

4.12.5 Extract unique values using unique()

4.12.6 Check if two vectors are identical using identical()

4.12.7 Retreive duplicate items in a vector using duplicated()

4.12.8 Generate sequences using other objects with seq_len() / seq_along()

4.12.9 Divide a vector into categories (factor) using cut()

4.12.10 Scale the columns of a matrix using scale()

4.12.11 Output the results using cat()

4.12.12 Sort a vector using sort()

4.12.13 Arrange the elements of a vector using order()

4.12.14 Check structure using str()

Generate a summary using summary()