24 Date and Time calculations
24.1 Base R classes to deal with date and time variables
There may hardly be any data analytics activity, wherein we do not have to deal with temporal information and thus need to manipulate dates and date-time objects/variables. To deal with these variable types, R has four core data types/classes,
-
Date
To deal with date objects (note:D
in UPPER case) -
POSIXct
To deal with times -
POSIXlt
To deal with times (for difference between these two refer section 24.1.2.) -
difftime
To deal with time-spans
24.1.1 Dates
Date
objects can be created from string/character
type objects (we will just see that date objects can be created from numeric objects too), using base R’s function as.Date()
which accepts a character vector and a format to parse the date from the given string of characters. E.g.
as.Date("2000-12-12") # default format
## [1] "2000-12-12"
as.Date("12-12-2000", format = "%d-%m-%Y") # custom format
## [1] "2000-12-12"
In the above example, we can see that a custom format has been given to parse the character string into the date type object. We can check the class
of the object created.
## [1] "Date"
The format used in the above function accepts special codes, which can be seen by running ?strptime
in your R console. A ready reference for most used codes is however, given in table 24.1 in section 24.1.9. The date objects in R, are actually based on numeric classes which actually store the number of days since an epoch. Check the following code-
## [1] "Date"
## [1] "double"
## [1] 20070
So, in R versions 4.3.0 and later, it also accepts a number to convert into a date using the number of days elapsed since epoch. So,
as.Date(1)
## [1] "1970-01-02"
Here date since one day of default origin date 1970-01-01 has been returned. However, default origin
can be changed using the argument origin
. Example,
# One day since launch of R version 1.0
as.Date(1, origin = "2000-02-29")
## [1] "2000-03-01"
24.1.2 Times
Now, to create date-time objects we can use either of POSIXct
and POSIXlt
classes, i.e. by using as.POSIXct()
and/or as.POSIXlt()
functions, as shown below.
as.POSIXct("1990/2/17 12:20:05") # default format
## [1] "1990-02-17 12:20:05 IST"
as.POSIXct("17-2-1990 15:30:00", format = "%d-%m-%Y %H:%M:%S")
## [1] "1990-02-17 15:30:00 IST"
as.POSIXlt("17-2-1990 15:30:00", format = "%d-%m-%Y %H:%M:%S")
## [1] "1990-02-17 15:30:00 IST"
As regards, the difference between these two classes, POSIXct
stores seconds since UNIX epoch (and other information), and POSIXlt
, which stores a list of day, month, year, hour, minute, second, etc., which can be understood from the following codes.
ct
inPOSIXct
stands for calendar time whereas,lt
inPOSIXlt
stands for local time, explaining the conceptual difference.
time_ct <- as.POSIXct("17-2-1990 15:30:00", format = "%d-%m-%Y %H:%M:%S")
class(time_ct)
## [1] "POSIXct" "POSIXt"
unclass(time_ct)
## [1] 635248800
## attr(,"tzone")
## [1] ""
## POSIXlt
# Convert above object into POSIXlt type
time_lt <- as.POSIXlt(time_ct)
# Let us print
time_lt
## [1] "1990-02-17 15:30:00 IST"
class(time_lt)
## [1] "POSIXlt" "POSIXt"
unclass(time_lt)
## $sec
## [1] 0
##
## $min
## [1] 30
##
## $hour
## [1] 15
##
## $mday
## [1] 17
##
## $mon
## [1] 1
##
## $year
## [1] 90
##
## $wday
## [1] 6
##
## $yday
## [1] 47
##
## $isdst
## [1] 0
##
## $zone
## [1] "IST"
##
## $gmtoff
## [1] 19800
##
## attr(,"tzone")
## [1] "" "IST" "+0630"
## attr(,"balanced")
## [1] TRUE
Similar to as.Date()
, as.POSIXct()
also accepts a number and converts it into a date-time object using that much number of seconds since epoch.
as.POSIXct(60)
## [1] "1970-01-01 05:31:00 IST"
24.1.3 Times (without dates)
In base R, we do not have specific class to deal with time objects. However, there is a package hms
part of tidyverse
which can be used to create time objects to perform required calculations.
hms::as_hms(10)
## 00:00:10
24.1.4 Timespan
In base R, besides above date/date-time object classes, there is a special class which creates time difference between two given temporal objects in specified units. The class is difftime
.
## Time difference of 28245 days
class(freedom_age)
## [1] "difftime"
unclass(freedom_age)
## [1] 28245
## attr(,"units")
## [1] "days"
Function difftime
can create difftime objects, as per specific requirement due to presence of an argument units
which can take values from one of “auto”
, “secs”
, “mins”
, “hours”
, “days”
, “weeks”
.
## Time difference of 52.28571 weeks
We can also create a difftime object by coercion.
a_difftime <- as.difftime(15, units = "days")
a_difftime
## Time difference of 15 days
class(a_difftime)
## [1] "difftime"
24.1.5 Time zones
Run the function Sys.timezone()
in your console to check the current timezone. Function OlsonNames()
will however, display the known location for time-zones.
## [1] "Asia/Calcutta"
OlsonNames()[c(253, 284)]
## [1] "Asia/Calcutta" "Asia/Kolkata"
So, we may use tz
argument of as.POSIXlt()
and as.POSIXct()
functions to coerce the numeric or other class variables to dates and/or times.
as.POSIXct(1, tz = "GMT")
## [1] "1970-01-01 00:00:01 GMT"
24.1.6 Coercion
In earlier sections we have already seen coercing functions useful to coerce the objects from one class to another.
24.1.7 Extracting parts from dates/times
Base R provides us some useful function to extract relevant part of objects of dates/times classes.
-
weekdays(x, abbreviate = FALSE)
to extract the weekday name. (Output is in character vector) -
months(x, abbreviate = FALSE)
to extract the month name. (Output is in character vector) -
quarters(x)
to extract the quarter. Output is in character vector. -
julian(x)
to extract the days elapsed since origin. Output is in numeric vector.
Examples-
## [1] "Friday" "Saturday" "Sunday" "Monday" "Tuesday" "Wednesday"
## [7] "Thursday"
months(dates_vec)
## [1] "January" "January" "January" "January" "February" "February" "February"
quarters(dates_vec)
## [1] "Q1" "Q1" "Q1" "Q1" "Q1" "Q1" "Q1"
julian(dates_vec)
## [1] 15002 15003 15004 15005 15006 15007 15008
## attr(,"origin")
## [1] "1970-01-01"
24.1.9 strptime
character codes
A few format codes useful to parse date/date-time objects in R, are listed below for ready reference. To know more about these codes, readers can see the output of ?strptime()
in their console. Some useful codes have been reproduced in the table 24.1 for ready reference.
Format | Meaning |
---|---|
%y (or %Y ) |
Year without (or with) Century |
%m |
Month as decimal number (01-12) |
%B (or %b ) |
Month name in full (or abbreviated) in current locale |
%d |
Day of the month (01-31) |
%H |
Hours as decimal number (00-23) |
%I |
Hours as decimal number, in 12 hour format (00-12) |
%M |
Minute as decimal number (00-59) |
%S |
Second as integer (00-61) allowing up to two leap-seconds |
%OS |
Second(s) as decimal number |
%T |
Equivalent to %H:%M:%S
|
%p |
AM/PM indicator in the locale. |
%A (or %a ) |
Full (or abbreviated) week-day name, in current locale |
%j |
Day of year as decimal number (001–366): For input, 366 is only valid in a leap year. |
%u |
Weekday as a decimal number (1–7, Monday is 1) |
%w |
Weekday as decimal number (0–6, Sunday is 0). |
%U |
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention. |
%V |
Week of the year as decimal number (01–53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. |
%W |
Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention. |
%z |
Offset from UTC |
%Z |
Time Zone name |
24.2 Using lubridate
package for parsing/creating dates
Package lubridate
which is now part of core tidyverse
is extremely helpful package while analysing temporal variables in data. Creating date/date-time objects and converting variables from numeric/character to these types is easier than in base R. Besides, it offers us some other temporal class objects to make our time-series analysis easier.
24.2.1 Date/date-time objects creation
Date and Date-time object classes in lubridate
are defined similarly as in base R. So we can create a date (or date-time) using a number of days (or seconds) elapsed since epoch.
## [1] "1976-07-20"
as_datetime(206668800)
## [1] "1976-07-20 UTC"
## [1] "2024-12-13"
# Check its class
class(date_today)
## [1] "Date"
# Or see what's all in there
unclass(date_today)
## [1] 20070
24.2.2 Parsing date/date-time objects from character
In lubridate
parsing date/date-time objects from character strings is pretty easy as we have a number of functions useful to parse dates/date-times written in any specific locale/order irrespective of the delimiter used to separate different components of date/time therein. In these functions, the orders of the year (y
), quarter (q
), month (m
) date (d
) hour (h
) minute (m
) and second (s
) are represented by their first characters. These are -
-
ymd_hms
,ymd_hm
,ymd_h
,ymd
-
dmy_hms
,dmy_hm
,dmy_h
,dmy
-
mdy_hms
,mdy_hm
,mdy_h
,mdy
-
ydm_hms
,ydm_hm
,ydm_h
,ydm
-
myd
,dym
-
yq
,ym
,my
A few of the examples are -
ymd("19760720")
## [1] "1976-07-20"
dmy("01.12.2004")
## [1] "2004-12-01"
dmy("15th of January, 2006")
## [1] "2006-01-15"
mdy_hm("August 15th, 1947 at 10:45 PM")
## [1] "1947-08-15 22:45:00 UTC"
my("04-2006")
## [1] "2006-04-01"
yq("2024: Quarter 4")
## [1] "2024-10-01"
To parse date-time from a fraction of year passed, we can use date_decimal()
. Example-
date_decimal(2024.162)
## [1] "2024-02-29 07:00:28 UTC"
24.2.3 Parsing dates/date-times from individual conponents
For the scenario, where we have to create date/date-times we can make use of two functions make_date
and make_datetime
. Syntax is
make_datetime(
year = 1970L,
month = 1L,
day = 1L,
hour = 0L,
min = 0L,
sec = 0,
tz = "UTC"
)
make_date(year = 1970L, month = 1L, day = 1L)
Example-
make_date(year = 1947, month = 8, day = 15)
## [1] "1947-08-15"
24.3 Extracting and setting date/date-time components
24.3.1 Extraction
We can extract the specific components from date/date-times using, intuitively named accessor functions, which are listed below. Note that in all of these functions the individual date components is in singular; as plural component functions will be used in different context, as we will discuss in Section 24.4.1.
-
year()
for Year;isoyear()
for ISO 8601 Year -
semester()
for Semester -
quarter()
for Quarter -
month()
for Month -
week
for Week of the year,isoweek()
for ISO 8601 Week -
day
for Day of the month,wday()
for Day of the week,qday()
for Day of the quarter -
hour()
,minute()
andsecond()
for Hour, Minute and Second respectively -
tz()
for Time Zone
See following examples.
# Make an example date
(a_time <- as_datetime(999999999))
## [1] "2001-09-09 01:46:39 UTC"
# Extract the DAY of the month
day(a_time)
## [1] 9
# Extract Weekday name in full
wday(a_time, label = TRUE, abbr = FALSE)
## [1] Sunday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
# Extract second component
second(a_time)
## [1] 39
# Extract Month name (abbreviated)
month(a_time, label = TRUE, abbr = TRUE)
## [1] Sep
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
# Extract week number of the year
week(a_time)
## [1] 36
Besides above functions, there are a few functions which are helpful in knowing certain characteristics of date/date-time variable. These functions return which logical vectors, are listed as -
-
am(x)
,pm(x)
to know whether thex
is in AM or PM, respectively. -
dst(x)
to know whetherx
is in Daylight savings. -
leap_year(x)
to know whetherx
is a leap year.
Examples-
# Is our a_time AM?
am(a_time)
## [1] TRUE
# Is a_time falling in leap year?
leap_year(a_time)
## [1] FALSE
# Is it during DST?
dst(a_time)
## [1] FALSE
24.3.2 Setting Components
Just like we used accessor functions to extract the date-time components, we can use those functions to set specific component in date/date-time object/variable. See Example-
# Initial variable
a_time
## [1] "2001-09-09 01:46:39 UTC"
# Setting component - Year
year(a_time) <- 2024
# Print modified variable
a_time
## [1] "2024-09-09 01:46:39 UTC"
# Modify Time zone
tz(a_time) <- "Asia/Kolkata"
# print modified variable
a_time
## [1] "2024-09-09 01:46:39 IST"
Finally, function update
can also be used to return a date with the specified elements updated. Example-
## [1] "2020-01-31"
# If values are too big, they will roll-over:
update(EOD, month = 2)
## [1] "2020-03-02"
24.4 Time spans
When we subtract a date-time object from another in base R, we get a difftime
object, as we have already seen in section 24.1.4.
We know that every time/date unit is not same in strict sense. Like leap years may have 366 days whereas others have 365 days. Now there is a concept of leap seconds too. Moreover, certain countries have daylight saving time and thus, every day is not of equal length there. In order to deal with specific and clear requirements on date calculations, there are three different classes in lubridate
, as discussed next.
Lubridate has three classes, which may sound similar at first, but are different in their working. These are -
period
duration
interval
Let us discuss about each one separately.
24.4.1 Periods
The period
objects track changes in clock times and ignore time line irregularities. So every time object is of standard length like our clocks.
Period objects can be created in lubridate
using pluralized date component functions. Example-
(one_year <- years(1))
## [1] "1y 0m 0d 0H 0M 0S"
class(one_year)
## [1] "Period"
## attr(,"package")
## [1] "lubridate"
## [1] "2020-03-01"
24.4.2 Durations
Durations, on the other hand track changes in physical time, i.e. taking into account all the timeline adjusting irregularities. These duration objects can be created in lubridate, by adding d
as prefix to all pluralized period
objects.
(one_year_duration <- dyears(1))
## [1] "31557600s (~1 years)"
class(one_year_duration)
## [1] "Duration"
## attr(,"package")
## [1] "lubridate"
# Add one year DURATION to get a leap year date?
a_date + dyears(1)
## [1] "2020-02-29 06:00:00 UTC"
24.4.3 Intervals
The objects of class interval
represent specific interval on the timeline. In other words, these have a specific start and end date/datetimes. These can be created in lubridate using function interval
, which has syntax like -
interval(start = NULL, end = NULL, tzone = tz(start))
This should be clear with the following examples.
## [1] 1947-08-15 UTC--2024-12-13 UTC
class(a_interval)
## [1] "Interval"
## attr(,"package")
## [1] "lubridate"
Intervals can also be created using special operator %--%
. E.g.
## [1] 1947-08-15 UTC--1950-01-26 UTC
24.5 Performing calculations on intervals/dates
There are a few functions to make our life easier while performing data analysis on temporal fields. These are -
24.5.1 %within%
operator
Operator a %within% b
checks whether interval/date-time a
falls with interval b
. Returns boolean value(s).
interval1 %within% a_interval
## [1] TRUE
24.5.2 Backward intervals
Intervals in lubridate can be backwards too. E.g.
## [1] 2024-01-26 UTC--2024-01-15 UTC
24.5.3 Flipping intervals
Function int_flip()
can flip the interval. E.g.
int_flip(back_interval)
## [1] 2024-01-15 UTC--2024-01-26 UTC
24.5.4 Checking alignment of two intervals
Function int_aligns
tests if two intervals share an endpoint. The direction of each interval is ignored. In other words, it actually tests whether the earliest or latest moments of each interval occur at the same time. E.g.
int1 <- interval(ymd("2001-01-01"), ymd("2002-01-01"))
int2 <- interval(ymd("2001-06-01"), ymd("2002-01-01"))
int3 <- interval(ymd("2003-01-01"), ymd("2004-01-01"))
int_aligns(int1, int2)
## [1] TRUE
int_aligns(int1, int3)
## [1] FALSE
24.5.5 Checking Overlap in two intervals
Function int_overlaps
can test if two intervals overlap each other.
int_overlaps(int1, int2)
## [1] TRUE
24.5.6 Length of the interval
Function int_length()
can calculate the length of interval and returns a numeric
variable equal to the seconds in that interval. E.g.
int_length(back_interval)
## [1] -950400
24.5.7 Adding Months without exceeding last day of the month.
Operators %m+%
and %m-%
will add (or subtract) months to a date without exceeding the last day of the new month. E.g.
(leap <- ymd("2012-02-29"))
## [1] "2012-02-29"
## [1] "2013-02-28"
## [1] "2011-02-28"
## [1] "2011-02-28"
Another Example-
## [1] NA "2010-03-31 03:04:05 UTC"
## [3] NA
## [1] "2010-02-28 03:04:05 UTC" "2010-03-31 03:04:05 UTC"
## [3] "2010-04-30 03:04:05 UTC"
24.5.8 Adding with Rollback
One more function add_with_rollback()
which performs similarly, but has more control due to specific syntax-
add_with_rollback(e1, e2, roll_to_first = FALSE, preserve_hms = TRUE)
Example-
x <- ymd_hms("2019-01-29 01:02:03")
add_with_rollback(x, months(1))
## [1] "2019-02-28 01:02:03 UTC"
add_with_rollback(x, months(1), preserve_hms = FALSE)
## [1] "2019-02-28 UTC"
add_with_rollback(x, months(1), roll_to_first = TRUE)
## [1] "2019-03-01 01:02:03 UTC"
add_with_rollback(x, months(1), roll_to_first = TRUE, preserve_hms = FALSE)
## [1] "2019-03-01 UTC"
24.5.9 Coercing one time span unit to another
Time-span objects in lubridate can be coerced from one to another using coercing functions,
as.period(x, unit)
as.duration(x)
as.interval(x, start)
make_difftime(x)
Examples-
# With Period - clock time
(per <- days(31))
## [1] "31d 0H 0M 0S"
(int1 <- as.interval(per, dmy("01022020")))
## [1] 2020-02-01 UTC--2020-03-03 UTC
# With Duration - physical time
(dur <- ddays(31))
## [1] "2678400s (~4.43 weeks)"
(int2 <- as.interval(dur, dmy("01022020")))
## [1] 2020-02-01 UTC--2020-03-03 UTC
24.6 Rounding date-time variables
There are dedicated functions to round the dates as per specific requirements.
-
floor_date()
takes a date-time object and rounds it down to the nearest boundary of the specified time unit. -
ceiling_date()
takes a date-time object and rounds it up to the nearest boundary of the specified time unit. -
round_date()
takes a date-time object and time unit, and rounds it to the nearest value of the specified time unit.
Examples-
x <- ymd_hms("2009-08-03 12:01:59.23")
round_date(x, "month")
## [1] "2009-08-01 UTC"
round_date(x, "week")
## [1] "2009-08-02 UTC"
floor_date(x, "day")
## [1] "2009-08-03 UTC"
ceiling_date(x, "month")
## [1] "2009-09-01 UTC"
Three other functions helps rolling a date forward or backwards, as
-
rollbackward()
changes a date to the last day of the previous month or to the first day of the month. -
rollforward()
rolls to the last day of the current month or to the first day of the next month. Optionally, the new date can retain the same hour, minute, and second information. -
rollback()
is a synonym forrollbackward()
.
See these examples-
date <- ymd_hms("2010-03-03 12:44:22")
rollbackward(date)
## [1] "2010-02-28 12:44:22 UTC"
rollbackward(date, roll_to_first = TRUE)
## [1] "2010-03-01 12:44:22 UTC"
rollbackward(date, preserve_hms = FALSE)
## [1] "2010-02-28 UTC"
rollbackward(date, roll_to_first = TRUE, preserve_hms = FALSE)
## [1] "2010-03-01 UTC"
rollforward(date)
## [1] "2010-03-31 12:44:22 UTC"
rollforward(date, roll_to_first = TRUE)
## [1] "2010-04-01 12:44:22 UTC"
24.7 Representing date and date-times in cutomised formats
In all of the above sections, we saw that once R recognises a temporal object, it depicts that objects in a uniform format. See
(date1 <- dmy("01-01-2020"))
## [1] "2020-01-01"
(date2 <- ymd("2020/01/01"))
## [1] "2020-01-01"
as.character(date1)
## [1] "2020-01-01"
In the output/console all dates print like a character/string. However, sometimes requirement is to have print dates in a specific customised format. In this scenario, function strftime()
comes to our rescue which converts objects of classes “POSIXlt” and “POSIXct” representing calendar dates and times to specific character representation. Character codes can be used from the table 24.1 in section 24.1.9.
See following examples-
## [1] "13 December 2024"
In this context, let’s also discuss about a special date stamping function in lubridate
which can format date/time outputs based on human friendly formats. Functions stamp()
, stamp_date
and stamp_time()
will create a function from the given template, which can be applied to date/time objects to re-format them. Example-
eclipse_dates <- dmy(c("11-7-2010", "13-11-2012", "3-11-2013"))
eclipse_stamp <- stamp_date("There was a solar eclipse on January 13th, 1999")
eclipse_stamp(eclipse_dates)
## [1] "There was a solar eclipse on July 11th, 2010"
## [2] "There was a solar eclipse on November 13th, 2012"
## [3] "There was a solar eclipse on November 03th, 2013"
24.8 Dates in GGPLOT2 visualisations
Before concluding the chapter on date and time variables, it is important to learn the related functions to deal with impact of date and time formats on visualisations. Since we have already seen that, date and date-time variables are actually continuous variables with labels depicted in a specific formats, we have two scale
functions in ggplot2 to deal with -
In these functions, we have date_breaks
and date_minor_breaks
arguments to position the breaks by date units. E.g. date_breaks = "6 months"
will place major tick mark every six months.
Using another argument date_labels
we can control the display of the labels in the plots. The values of the argument may be in strptime
formats as we have already seen in 24.1 above.
Example-
library(tidyverse, warn.conflicts = FALSE)
economics %>%
ggplot(aes(date, uempmed)) +
geom_line() +
scale_x_date(date_labels = "%Y",
date_breaks = "5 years",
date_minor_breaks = "1 year") +
labs(x = "", y = "")

Figure 24.1: Use of Date Time Scale in ggplot2