4 Existing and useful functions in base R

R has a lot of inbuilt/existing functions that are useful and therefore it is good to know about them. Let us discuss a few of these existing functions which are useful for data analytics and other allied jobs.

Firstly, let’s learn logical operators that will be useful to check various conditions. For those who doesn’t know what operators are, they may simply think of operators being special kind of functions having exactly two arguments.

4.1 Conditions and logical operators/operands

Table 4.1: Conditions and logical operators/operands
Operator/ function Meaning Example
== Is RHS equal to LHS? 5 == 2 will return FALSE
'Anil' == 'anil' is FALSE
!= Is RHS not equal to LHS? 'ABCD' != 'abcd' is TRUE
>= Is LHS greater than or equal to RHS? 5 >= 2 will return TRUE
<= Is LHS less than or equal to RHS? 15 <= 2 will return FALSE
> Is LHS strictly greater than RHS? 2 > 2 will return FALSE
< Is LHS strictly less than RHS? 12 < 12 will return FALSE
is.na() Whether the argument passed is NA is.na(NA) is TRUE
is.null() Whether the argument passed is null is.null(NA) is FALSE
| Logical OR TRUE | FALSE will return TRUE
& Logical AND TRUE & FALSE will return FALSE
! Logical NOT !TRUE will return FALSE
|| Element wise Logical OR Examines only the first element of the operands resulting into a single length logical vector
&& Element wise Logical AND Examines only the first element of the operands resulting into a single length logical vector
%in% LHS IN RHS Checks whether LHS elements are present in RHS vector

Vectorisation of operations and functions

All the above mentioned operators are vectorised. Except || and && will return vector of same length as we are comparing. Check

LETTERS[1:4] == letters[1:4]
## [1] FALSE FALSE FALSE FALSE
10:1 >= 1:10
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
# TRUE will act as 1 and FALSE as 0
x <- c(TRUE, FALSE, FALSE, TRUE)
y <- c(1, 0, 1, 10)
x == y
## [1]  TRUE  TRUE FALSE FALSE
# Examples of element wise operations
x & y
## [1]  TRUE FALSE FALSE  TRUE
x | y
## [1]  TRUE FALSE  TRUE  TRUE
# character strings may be checked for alphabetic order
'ABCD' >= 'AACD'
## [1] TRUE

4.2 Recycling

Recycling rules apply when two vectors are not of equal length. See these examples.

# Notice that results are displayed silently
LETTERS[1:4] == 'A'
## [1]  TRUE FALSE FALSE FALSE
#Notice that results are displayed with a warning
LETTERS[1:5] == LETTERS[1:3]
## Warning in LETTERS[1:5] == LETTERS[1:3]: longer object length is not a multiple
## of shorter object length
## [1]  TRUE  TRUE  TRUE FALSE FALSE

The operator %in% behaves slightly different from above. Each searches each element of LHS in RHS and gives result in a logical vector equal to length of LHS vector. See these examples carefully.

'A' %in% LETTERS
## [1] TRUE
LETTERS %in% LETTERS[1:4]
##  [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE

4.3 Handling Missing values NA in these operations

While checking for any condition to be TRUE or FALSE missing values NA and/or NaN should be handled carefully or a bug may be introduced. See these examples-

FALSE != NA
## [1] NA
TRUE != NA
## [1] NA

Thus, if any of the condition is evaluated on a vector, we can have NA in our output along with TRUE and FALSE. See this example

x <- c(1, 5, 15, NA, 2, 3)
x <= 5
## [1]  TRUE  TRUE FALSE    NA  TRUE  TRUE

These missing values however behaves slightly different with logical operators & |. See these examples.

TRUE | NA
## [1] TRUE
FALSE & NA
## [1] FALSE

4.4 Use of above logical operators for subsetting

Since the logical operations on vectors gives a logical vector as output, these can be used for sub-setting as well. See these examples.

my_ages <- c(40, 45, 31, 51, 25, 27, 59, 45)
# filter ages greater than or equal to 30
my_ages[my_ages >= 30]
## [1] 40 45 31 51 59 45
my_names <- c("Andrew", "Bob", "Carl", "Daven", "Earl")
# filter names which start with alphabet either A, B or C
my_names[my_names <= "D"]
## [1] "Andrew" "Bob"    "Carl"

4.5 Conditions with ifelse

Syntax ifelse(test, yes, no) will be used to return value (of same shape as test) which is filled with elements selected from either yes or no depending on whether the elements of test are TRUE or FALSE. See this example

x <- c(1:5, NA, 16:20)
ifelse(x>5, 'Greater than 5', 'Upto 5')
##  [1] "Upto 5"         "Upto 5"         "Upto 5"         "Upto 5"        
##  [5] "Upto 5"         NA               "Greater than 5" "Greater than 5"
##  [9] "Greater than 5" "Greater than 5" "Greater than 5"

4.6 Functions all() and any()

These are shortcut functions to tell us whether all or any of the elements of given object are TRUE. See This example

x <- 11:20
all(x > 5)
## [1] TRUE
any(x > 20)
## [1] FALSE

All of the above mentioned operators (along with those listed in section 1.1) are vectorised. Check these examples.

x <- 1:5
y <- 6:10

x + y
## [1]  7  9 11 13 15
x - y
## [1] -5 -5 -5 -5 -5
x * y
## [1]  6 14 24 36 50
x / y
## [1] 0.1666667 0.2857143 0.3750000 0.4444444 0.5000000
x ^ y
## [1]       1     128    6561  262144 9765625
# Caution: here RHS is not a vector
y %% 3
## [1] 0 1 2 0 1
y %/% 3
## [1] 2 2 2 3 3

Recycling also applies on mathematical operators. See these examples and notice when R gives results silently and when with a warning.

10:15 + 4
## [1] 14 15 16 17 18 19
100:110 - 50
##  [1] 50 51 52 53 54 55 56 57 58 59 60
# when length of one vector is multiple of length of smaller vector
x <- c(5, 2, 7, 9)
y <- c(7, 8)
x + y
## [1] 12 10 14 17
# when length of one vector is not multiple of length of smaller vector
x + c(1, 2, 3)
## Warning in x + c(1, 2, 3): longer object length is not a multiple of shorter
## object length
## [1]  6  4 10 10

All the above-mentioned operators/functions may also be used on matrices, arrays of larger dimension, since we have already seen that matrices/arrays are actually vectors at the core.

mat1 <- matrix()

4.7 Common arithmetical Functions

Table 4.2: Common Arithmetical Functions
Function Meaning Input Output
sum() Adds all elements One or more Vector, matrix, array Vector having 1 element only
prod() Returns product of all elements One or more Vector, matrix, array Vector having 1 element only
mean() Returns the arithmetic mean One Vector, matrix, array Vector having 1 element only
max() Returns maximum value One or more Vector, matrix, array Vector having 1 element only
min() Returns minimum value One or more Vector, matrix, array Vector having 1 element only
ceiling() Returns integer(s) not less than given values One Vector, matrix, array Vector, matrix, array having same dim
floor() Returns largest integers not greater than given values One Vector, matrix, array Vector, matrix, array having same dim
trunc() returns integers formed by truncating the values towards 0 One Vector, matrix, array Vector, matrix, array having same dim
round(x, digits = 0) Rounds the given value(s) to number of decimal places provided One Vector, matrix, array Vector, matrix, array having same dim
signif(x, digits = 6) Round to significant digits One Vector, matrix, array Vector, matrix, array having same dim
factorial() Returns factorial One Vector, matrix, array of integer type Vector having 1 element
sqrt() Returns square root One Vector, matrix, array Vector, matrix, array having same dim
log10() or log2() Logrithm with base 10 or 2 respectively One Vector, matrix, array Vector, matrix, array having same dim
exp(x) returns exponential One Vector, matrix, array Vector, matrix, array having same dim

See these examples.

sum(1:100, 1:10)
## [1] 5105
Mat1 <- matrix(1:10, nrow = 2)
Mat2 <- matrix(1:4, nrow = 2)

prod(Mat1, Mat2)
## [1] 87091200
sqrt(Mat2)
##          [,1]     [,2]
## [1,] 1.000000 1.732051
## [2,] 1.414214 2.000000
log10(Mat1)
##         [,1]      [,2]      [,3]     [,4]      [,5]
## [1,] 0.00000 0.4771213 0.6989700 0.845098 0.9542425
## [2,] 0.30103 0.6020600 0.7781513 0.903090 1.0000000
factorial(10:1)
##  [1] 3628800  362880   40320    5040     720     120      24       6       2
## [10]       1

4.7.1 Missing values

If the vector on which we are calculating sum etc., has missing values, we will have to use argument na.rm = TRUE in these functions (Check documentation of these functions individually once). See these examples -

x <- c(1:50, NA)
sum(x)
## [1] NA
sum(x, na.rm = TRUE)
## [1] 1275
mean(x, na.rm = TRUE)
## [1] 25.5

4.8 Some Statistical functions

Table 4.3: Some commonly used Statistical Functions
Function Meaning Input Output
sd() Returns standard deviation One Vector, matrix, array Vector having 1 element only
var() Returns variance One or more Vector, matrix, array Vector having 1 element only
median() Returns median value One Vector, matrix, array Vector having 1 element only
range() Returns range One Vector, matrix, array Vector having 2 elements
IQR() Computes interquartile range of the x values One Vector, matrix, array Vector having 1 element only
quantile() Computes percentile of given values for the given probabilities in probs argument One Vector, matrix, array Named Vector having 5 elements by default, OR equal to the length of probs vector given

Examples-

median(1:100)
## [1] 50.5
range(1:100, 45, 789)
## [1]   1 789
quantile(1:100)
##     0%    25%    50%    75%   100% 
##   1.00  25.75  50.50  75.25 100.00
quantile(0:100, probs = 1:10 / 10)
##  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
##   10   20   30   40   50   60   70   80   90  100

4.9 Functions related to sampling and probability distributions

4.9.1 Set the random seed with set.seed()

It is a way to specify the random seed which is an integer vector, containing the random number generator (RNG) state for random number generation in R. It does not given any output but makes your code reproducible for further use.

4.9.2 Generate random numbers with rnorm() / runif() / rpois() etc.

Used to generate random numbers from normal, uniform and poisson distributions respectively. Of course there are numerous other functions not only to calculate random numbers but to calculate probability, density of these and other probability distributions (such as binomial, t), but those are beyond the scope of this book. E.g.

rnorm(n=10) #default mean is 0 and SD is 1
##  [1]  1.8062714  0.4990112 -0.6961670 -0.4514996 -1.0704310  0.2688084
##  [7]  0.8687072  1.1770034 -0.3030270 -0.1455080
rnorm(n=10) # notice these will produce different results each time.
##  [1] -0.61196717 -0.14021692 -1.39770173  0.09571398  1.26362876  1.10585254
##  [7]  0.06406968  2.04684899 -1.02892626  0.30159577
# If however seed is fixed as above, these will be reproducible.
set.seed(123)
runif(10) # default min and max are 0 and 1 respectively
##  [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673 0.0455565 0.5281055
##  [8] 0.8924190 0.5514350 0.4566147
set.seed(123)
runif(10)
##  [1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673 0.0455565 0.5281055
##  [8] 0.8924190 0.5514350 0.4566147

4.9.3 Random Sample with sample()

Used to take a sample of the specified size from the elements of x using either with or without replacement. E.g.

set.seed(123)
sample(LETTERS, 5, replace = FALSE)
## [1] "O" "S" "N" "C" "J"
set.seed(111)
sample(LETTERS, 15, replace = TRUE)
##  [1] "N" "T" "S" "O" "Y" "E" "C" "H" "Z" "Q" "M" "J" "D" "O" "H"

If the sampling is proportionate to given probabilities the same can be provided in prob argument.

set.seed(12)
sample(LETTERS, 5, replace = FALSE, prob = 1:26)
## [1] "Z" "K" "F" "V" "X"

4.10 Other Mathematical functions

4.10.1 Progressive calculations with cumsum() /cumprod()

Used to calculate running total or product. Output vector length will be equal to that of input vector.

cumsum(1:10)
##  [1]  1  3  6 10 15 21 28 36 45 55
cumprod(-5:5)
##  [1]   -5   20  -60  120 -120    0    0    0    0    0    0

Other similar functions like cummax() (cumulative maximum) and cummin() may also be useful.

set.seed(1)
x <- sample(1:100, 10)
cummin(x)
##  [1] 68 39  1  1  1  1  1  1  1  1
##  [1] 68 68 68 68 87 87 87 87 87 87

4.10.2 Progressive difference diff()

Used to calculate running difference (difference between two consecutive elements) in the given numeric vector. Output will be shorter by one element. E.g.

set.seed(123)
x <- rnorm(10)
x
##  [1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499
##  [7]  0.46091621 -1.26506123 -0.68685285 -0.44566197
diff(x)
## [1]  0.33029816  1.78888580 -1.48819992  0.05877934  1.58577725 -1.25414878
## [7] -1.72597744  0.57820838  0.24119088
## [1] 9

4.11 String Manipulation functions

4.11.1 Concatenate strings with paste() and paste0()

R’s inbuilt function paste() concatenates each element of one or more vectors given as argument. Argument sep is used to provide separator is any, which by default is a space i.e. " ". On the other sep argument is not available in paste0 which thus concatenates elements without any separator.

paste(LETTERS, letters)
##  [1] "A a" "B b" "C c" "D d" "E e" "F f" "G g" "H h" "I i" "J j" "K k" "L l"
## [13] "M m" "N n" "O o" "P p" "Q q" "R r" "S s" "T t" "U u" "V v" "W w" "X x"
## [25] "Y y" "Z z"
paste0(letters, '_', 1:26) # check replication here
##  [1] "a_1"  "b_2"  "c_3"  "d_4"  "e_5"  "f_6"  "g_7"  "h_8"  "i_9"  "j_10"
## [11] "k_11" "l_12" "m_13" "n_14" "o_15" "p_16" "q_17" "r_18" "s_19" "t_20"
## [21] "u_21" "v_22" "w_23" "x_24" "y_25" "z_26"

Note: that both paste and paste0 returns vector with length equal to length of larger vector. Thus if the requirement is to concatenate each of the element in the given vector(s), use another argument collapse. See this example.

paste0(letters, 1:26, collapse = '+')
## [1] "a1+b2+c3+d4+e5+f6+g7+h8+i9+j10+k11+l12+m13+n14+o15+p16+q17+r18+s19+t20+u21+v22+w23+x24+y25+z26"

4.11.2 Functions startsWith() / endsWith()

To check whether the given string vector say x start or end with string (entries of) prefix or suffix we can use startsWith(x, prefix) or endsWith(x, suffix) respectively. E.g.

x <- c('apples', 'oranges', 'apples and oranges', 'oranges and apples', 'apricots')
startsWith(x, 'apples')
## [1]  TRUE FALSE  TRUE FALSE FALSE
startsWith(x, 'ap')
## [1]  TRUE FALSE  TRUE FALSE  TRUE
endsWith(x, 'oranges')
## [1] FALSE  TRUE  TRUE FALSE FALSE

Note that both these functions return logical vectors having same length as x.

4.11.3 Check number of characters in string vector using nchar()

To count the number of characters in each of the element in string vector, say x, we can use nchar(x) which will return a vector of integer types. E.g.

## [1]  6  7 18 18  8
y <- c('', ' ', '   ', NA)
nchar(y)
## [1]  0  1  3 NA

4.11.4 Change case using toupper() / tolower()

Changes the case of given vector to all UPPER or lower case respectively. Example-

x <- c('Andrew', 'Bob')
tolower(x)
## [1] "andrew" "bob"
## [1] "ANDREW" "BOB"

Extract a portion of string using substr()

To extract the characters from a given vector say x from a given start position to stop position (both being integers) we will use substr(x, start, stop). E.g.

substr(x, 2, 8)
## [1] "ndrew" "ob"

4.11.5 Split a character vector using strsplit()

To split the elements of a character vector x into sub-strings according to the matches to sub-string split within them. E.g.

strsplit(x, split = ' ')
## [[1]]
## [1] "Andrew"
## 
## [[2]]
## [1] "Bob"

Notice that output will be of list type.

4.11.6 Replace portions of string vectors sub() / gsub()

These two functions are used to perform replacement of the first and all matches respectively. E.g.

#Replace only first match
sub(pattern = 'B', replacement = '12', x, ignore.case = TRUE)
## [1] "Andrew" "12ob"
# Replace all matches
gsub(pattern = 'B', replacement = '12', x, ignore.case = TRUE)
## [1] "Andrew" "12o12"

4.11.7 Match patterns using grep() / grepl() / regexpr() / gregexpr()

These functions are used to match string passed as argument pattern under a string vector. These four however, differ in output/results. E.g.

grep(pattern = 'an', x) # will give indices.  
## integer(0)
#                         Output will be integer vector and length may be shorter than that of `x`
grepl(pattern = 'an', x) # will give a logical vector of same length as `x`
## [1] FALSE FALSE
regexpr(pattern = 'an', x) # output will have multiple attributes
## [1] -1 -1
## attr(,"match.length")
## [1] -1 -1
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE

Note that regexpr() outputs the character position of first instance of pattern match within the elements of given vector. gregexpr() is same as regexpr() but finds all instances of pattern. Output will be in list format. E.g.

gregexpr(pattern = 'an', x)
## [[1]]
## [1] -1
## attr(,"match.length")
## [1] -1
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
## 
## [[2]]
## [1] -1
## attr(,"match.length")
## [1] -1
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE

4.12 Other functions

4.12.1 Transpose a matrix using t()

Used to return transpose of given matrix. E.g.

mat <- outer(1:5, 1:5, FUN = \(x, y) paste0('A', x, y))
mat
##      [,1]  [,2]  [,3]  [,4]  [,5] 
## [1,] "A11" "A12" "A13" "A14" "A15"
## [2,] "A21" "A22" "A23" "A24" "A25"
## [3,] "A31" "A32" "A33" "A34" "A35"
## [4,] "A41" "A42" "A43" "A44" "A45"
## [5,] "A51" "A52" "A53" "A54" "A55"
t(mat)
##      [,1]  [,2]  [,3]  [,4]  [,5] 
## [1,] "A11" "A21" "A31" "A41" "A51"
## [2,] "A12" "A22" "A32" "A42" "A52"
## [3,] "A13" "A23" "A33" "A43" "A53"
## [4,] "A14" "A24" "A34" "A44" "A54"
## [5,] "A15" "A25" "A35" "A45" "A55"

4.12.2 Generate a frequency table using table()

Returns a frequency/contingency table of the counts at each combination of factor levels. E.g.

set.seed(123)
x <- sample(LETTERS[1:5], 100, replace = TRUE)
table(x)
## x
##  A  B  C  D  E 
## 21 20 23 17 19

If more than one argument is passed-

set.seed(1234)
df <- data.frame(State_code = x,
                 Code2 = sample(LETTERS[11:15], 100, replace = TRUE))
my_table <- table(df$State_code, df$Code2)
my_table
##    
##     K L M N O
##   A 5 5 4 4 3
##   B 4 3 6 2 5
##   C 6 3 3 6 5
##   D 2 2 4 6 3
##   E 2 6 4 4 3

4.12.3 Generate proportion of frequencies using prop.table()

This function takes a table object as input and calculate the proportion of frequencies.

prop.table(my_table)
##    
##        K    L    M    N    O
##   A 0.05 0.05 0.04 0.04 0.03
##   B 0.04 0.03 0.06 0.02 0.05
##   C 0.06 0.03 0.03 0.06 0.05
##   D 0.02 0.02 0.04 0.06 0.03
##   E 0.02 0.06 0.04 0.04 0.03

4.12.4 Column-wise or Row-wise sums using colSums() / rowSums()

Used to sum rows/columns in a matrix/data.frame. E.g.

# Row sums
rowSums(my_table)
##  A  B  C  D  E 
## 21 20 23 17 19
# Col sums
colSums(my_table)
##  K  L  M  N  O 
## 19 19 21 22 19

Note Similar to colSums()/ rowSums() we also have colMeans() and rowMeans().

rowMeans(my_table)
##   A   B   C   D   E 
## 4.2 4.0 4.6 3.4 3.8

4.12.5 Extract unique values using unique()

Used to extract only unique values/elements from the given vector. E.g.

unique(x) # note the output
## [1] "C" "B" "E" "D" "A"

4.12.6 Check if two vectors are identical using identical()

Used to check whether two given vectors/objects are identical.

identical(unique(x), LETTERS)
## [1] FALSE

4.12.7 Retreive duplicate items in a vector using duplicated()

Used to check which elements have already appeared in the vector and are thus duplicate.

set.seed(123)
x <- sample(LETTERS[1:5], 8, replace = TRUE)
x
## [1] "C" "C" "B" "B" "C" "E" "D" "A"
## [1] FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE

4.12.8 Generate sequences using other objects with seq_len() / seq_along()

Used to generate sequence of given integer length starting with 1, or with length equal to given vector, respectively. E.g.

## [1] 1 2 3 4 5
x <- c('Andrew', 'Bob')
seq_along(x)
## [1] 1 2

4.12.9 Divide a vector into categories (factor) using cut()

The function divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on. The output vector will be of type factor.

Example-1:

x <- c(1,2,3,4,5,2,3,4,5,6,7)
cut(x, 3)
##  [1] (0.994,3] (0.994,3] (0.994,3] (3,5]     (3,5]     (0.994,3] (0.994,3]
##  [8] (3,5]     (3,5]     (5,7.01]  (5,7.01] 
## Levels: (0.994,3] (3,5] (5,7.01]

Example-2:

cut(x, 3, dig.lab = 1, ordered_result = TRUE)
##  [1] (1,3] (1,3] (1,3] (3,5] (3,5] (1,3] (1,3] (3,5] (3,5] (5,7] (5,7]
## Levels: (1,3] < (3,5] < (5,7]

Note: that the output factor above is ordered.

4.12.10 Scale the columns of a matrix using scale()

Used to scale the columns of a numeric matrix.

x <- matrix(1:10, ncol = 2)
x
##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10
##            [,1]       [,2]
## [1,] -1.2649111 -1.2649111
## [2,] -0.6324555 -0.6324555
## [3,]  0.0000000  0.0000000
## [4,]  0.6324555  0.6324555
## [5,]  1.2649111  1.2649111
## attr(,"scaled:center")
## [1] 3 8
## attr(,"scaled:scale")
## [1] 1.581139 1.581139

Note: The output will always be of a matrix type with two more attributes. See this example

scale(1:5)
##            [,1]
## [1,] -1.2649111
## [2,] -0.6324555
## [3,]  0.0000000
## [4,]  0.6324555
## [5,]  1.2649111
## attr(,"scaled:center")
## [1] 3
## attr(,"scaled:scale")
## [1] 1.581139

4.12.11 Output the results using cat()

Outputs the objects, concatenating the representations. cat performs much less conversion than print.

cat('ABCD')
## ABCD

Note: that indices are now not printed. cat may print objects also. Example-2:

cat(month.name)
## January February March April May June July August September October November December

cat is useful to print special characters. Example-3:

cat('Budget Allocation is \u20b91.5 crore')
## Budget Allocation is ₹1.5 crore

4.12.12 Sort a vector using sort()

Used to sort the given vector. Example-1:

vec <- c(5, 8, 4, 1, 6)
sort(vec)
## [1] 1 4 5 6 8

Argumemt decreasing = TRUE is used to sort the vector in descending order instead of default ascending order. Example-2:

sort(vec, decreasing =  TRUE)
## [1] 8 6 5 4 1

4.12.13 Arrange the elements of a vector using order()

In contrast to sort() explained above, order() returns the indices of given vector in ascending order. Example

order(vec)
## [1] 4 3 1 5 2

Thus, sort(vec) will essentially perform the same operations as vec[order(vec)]. We may check-

identical(vec[order(vec)], sort(vec))
## [1] TRUE

4.12.14 Check structure using str()

The short str is not to be confused with strings as it instead is short for structure. Thus, str returns structure of given object. Example

str(vec)
##  num [1:5] 5 8 4 1 6

Extremely useful when we need to inspect data frames.

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Generate a summary using summary()

In addition to str explained above, summary() is also useful is getting result summaries of given objects. Example-1: When given object is vector

summary(vec)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     4.0     5.0     4.8     6.0     8.0

We observe that when numeric vector is passed, it produces quantile summary. Example-2: When input object is data frame.

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
##