Gearing up
0.1 Download and installation
The R programming language for local computer can be downloaded from web portal of The Comprehensive R Archive Network, in short mostly referred to as CRAN, which is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. The portal address is https://cran.r-project.org/ -

Figure 0.3: CRAN Portal
Download the specific (as per the operating system) file from the port and install it following the instructions. The R programming interface looks like-

Figure 0.4: R Workspace
0.2 Writing your first code
Writing code in R is pretty easy. Just type the command in front of >
, as shown in figure 0.4 prompt and press Enter(Return)
key. R will display the results in next line.


Figure 0.5: Left - Writing first Code in R; Right - Indenting code not necessary but recommended
0.3 Things to remember
- R is case sensitive. This will have to be remembered while writing/storing/calling functions or other objects. So
Anil
,ANIL
,anil
all are different objects in R. - White spaces between different pieces of codes don’t matter. See figure-0.5 above. Both
3+4
and3 + 4
will evaluate same. However, for better readability it is always better to use spaces. - Parenthesis
()
are generally used to change the natural order of precedence. Moreover, these are also used in passing arguments to functions, which will be discussed in detail in chapter-3 and onward. - Multi-line code(s) aren’t required to be indented in R. In R, indents have no meaning. However, following best practices to write a code that is understandable by readers, proper indentation is suggested. See figure-0.5 (right) above.
- If an incomplete code is written in the first line of the code (useful when a single line is not sufficient to write complete code), R will automatically prompt as displaying
+
at the beginning of line, instead of a>
. See figure-0.5 (right) above. - Indices in R always start from 1 (and not from 0). This has been discussed in detail in chapter-2.
- Code that start with hash symbol
#
does not execute. Even in a line if#
appears in between the line, the code from that place does not get executed. See the following example. Comments may be used in codes for either of the purposes -- Code Readability
- Explanation of code
- Inclusion of metadata, other references, etc.
- Prevent execution of certain line of code
# 1 + 3 (this won't be executed)
1 + 3 # +5
## [1] 4
Tip: to clear the workspace, just click
ctrl
+l
.
Normally R code files have an extension .R
but other R files may have other extensions, such as project files .Rproj
, markdown files .Rmd
, and many more.
All of the programming/code writing may be done in R. But you may have noticed that code once executed cannot be edited. The code has to written again (Tip: To get previous executed command just use scroll up key on keyboard). Thus, in order to use many other smart features, we will write our code as R scripts i.e. in .R
files, using most popular IDE for R which is R Studio
.
Rstudio IDE is so popular among those using R, that many people cannot distinguish between R and its IDE. Even Stack Overflow which is a popular forum to seek online help explicitly asks users not to tag ‘R studio’ in general R code problems2.
0.4 R studio IDE
RStudio is free and open source IDE (Integrated Development Environment) for R, which is available for Windows, Mac OS and LINUX. It can be downloaded from its portal https://posit.co/download/rstudio-desktop/. For our most of the data analytics needs, we require Rstudio desktop version, which is available for free to download and installation.
It includes a console, syntax-highlighting editor that supports direct code execution, and a variety of robust tools for plotting, viewing history, debugging and managing your work-space. After downloading and installing it the local machine, a work-space/UI similar to that shown in following figure, is opened.

Figure 0.6: R Studio interface
There are four panels
- Top-left:
-
Scripts and Files: The script files which we will be working on, will be opened and displayed here. To open a new script, you just need to click the new script button
which is just below the file menu.; or using keyboard shortcut
ctrl + shift + n
-
Scripts and Files: The script files which we will be working on, will be opened and displayed here. To open a new script, you just need to click the new script button
- Bottom-left:
- R console: is where the R commands can be written and see the output. Even the commands run on script will show the output in this panel.
+ Terminal: Here we can access our system shell.
- Top-right:
+ Environment: To see the objects saved in current environment. This panel is also used to import data in current environment.
+ History To view the history of commands run, in the current session
+ Connections: Used to connect/import with external database/data
- Bottom-right:
+ Files having tree of folders, to see the file structure of current working directory
+ Plots graph window, if the output of R command is a plot/graph, it will be generated here.
+ Packages, to download and load the external packages using mouse click
+ Help, window to get help on desired functions. Even the help sought through r command will be displayed in this window.
+ Viewer: can be used to view local web content.

Readers may note that to execute the code from a .R file is slightly different than to execute it from console
where pressing Enter/Return
key just executes it and gives us result in the next line. To run the script from the Scripts and Files
pane (Top-left) we can do either of the following -
- Select the code and press
ctrl/command + Enter/Return
keys. - If the cursor is anywhere between the code or even anywhere in the line(s) having the code/code-block, we can press
ctrl/command + Enter/Return
keys. - Or alternatively, we can make use of
Run
buttongiven in top-right side of
Files and Scripts
pane.

To get a quick overview (and for later-on references) readers may refer to the Rstudio cheatsheet available from Posit Cheatsheets page, wherein many other cheatsheets are also available.
0.5 Packages and libraries and conflicts
As already stated, one of the strength of R is that numerous user-written packages (or libraries) are available on Comprehensive R Archive Network i.e. CRAN. Package installation is perhaps easiest of the jobs in R.
The command is fairly simple -
install.packages("library_name")
which downloads the given package name (to be given in quotes and is case-sensitive), compiles it and then load it into the specified/default directory. This will however, not load into the memory/R current session. The libraries/packages are to be downloaded only once in a computer/system but need to be loaded in each and every new session of R, using the command-
library(library_name)
Quotes here, are optional but package name is still case sensitive. So to install and load tidyverse
we need to run first command once (which will download the package into your local computer) but second command (to load it in the current R session) at every new session.
install.packages('tidyverse')
Rstudio pane Packages
may also be used, as shown in the following image (taken from cheatsheet).

## ── Attaching core tidyverse packages ────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Also notice the output of library command above. Besides loading successfully, nine packages which we will discuss in section 0.7, it has given a message about conflicts.
So what are these conflicts? Actually when a function having exactly same name resides in multiple package a conflict arises, and R by default prefers the conflicted functions loaded in last. Here, package stats
which is part of base R also consists of a function filter
which has been overridden by package dplyr
loaded as part of tidyverse
. Thus, after loading dplyr
, this function filter
has been masked from stats
.
In case we want to use filter
from masked stats
we may either
- call it using double colon operator (Refer section 0.5.1), i.e. using
stats::filter()
; or - make use of another package
conflicted
which is again part of tidyverse as follows-
library(conflicted)
conflict_prefer("filter", "stats")
Usage of package
conflicted
is advised with a bit caution, as loading this package causes to restrict usage of conflicted function altogether i.e. without giving explicit preference.
0.5.1 Double Colon operator ::
In R, we can use double colon operator i.e. ::
to access functions that are defined as part of the internal functions that a package uses. These may be used in at least two cases-
- To call a function say
filter
from packagedplyr
we may usedplyr::filter()
without actually loading it.
- In cases of conflicts as discussed in preceding section, e.g.
stats::filter()
.
0.6 Getting Help within R
Once R is installed, there is a comprehensive built-in help system. We can use any of the following commands-
help.start() # general help
help(foo) # help about function `foo`
?foo # same as above
apropos("foo") # show all functions containing word `foo`
example(foo) # show an example of function `foo`
Alternatively, features under the Help menu or help pane, can also be used.
0.7 tidyverse
The tidyverse is a package of packages that work in harmony because they share common data representations and ‘API’ design. This package is designed to make all these easy to install and load multiple ‘tidyverse’ packages in a single step.
Though tidyverse
is a collection 20+ packages (in fact 80+ packages will be installed including depended packages) which are all installed by install.packages("tidyverse")
command, yet library(tidyverse)
load nine of them. Others (like readxl
) will have to loaded explicitly.
- ggplot2 is a system for decoratively creating graphics, based on The Grammar of Graphics.
- dplyr provides a grammar of data manipulation, providing a consistent set of verbs that solve the most common data manipulation challenges.
- tidyr provides a set of functions useful for data transformation.
- readr is used to read and write rectangular/tabular data formats.
- purrr is functional programming (FP) toolkit for working with functions and vectors.
- tibble provides functionalities related to displaying data frames.
- stringr provides set of functions designed to work with strings. It is built on top of another package stringi.
- forcats provides a suite of useful tools that solve common problems with factors.
- lubridate makes it easier to do the things R does with date-times.
With latest version of Tidyverse, while loading it lubridate also loads with default.

Figure 0.7: tidyverse
There are several other tidyverse
packages which we will be working with-
hms
readxl
glue