24 Date and Time calculations

24.1 Base R classes to deal with date and time variables

There may hardly be any data analytics activity, wherein we do not have to deal with temporal information and thus need to manipulate dates and date-time objects/variables. To deal with these variable types, R has four core data types/classes,

  • Date To deal with date objects (note: D in UPPER case)
  • POSIXct To deal with times
  • POSIXlt To deal with times (for difference between these two refer section 24.1.2.)
  • difftime To deal with time-spans

24.1.1 Dates

Date objects can be created from string/character type objects (we will just see that date objects can be created from numeric objects too), using base R’s function as.Date() which accepts a character vector and a format to parse the date from the given string of characters. E.g.

as.Date("2000-12-12") # default format
## [1] "2000-12-12"
as.Date("12-12-2000", format = "%d-%m-%Y") # custom format
## [1] "2000-12-12"

In the above example, we can see that a custom format has been given to parse the character string into the date type object. We can check the class of the object created.

a_date <- as.Date("20-12-2022", format = "%d-%m-%Y")
class(a_date)
## [1] "Date"

The format used in the above function accepts special codes, which can be seen by running ?strptime in your R console. A ready reference for most used codes is however, given in table 24.1 in section 24.1.9. The date objects in R, are actually based on numeric classes which actually store the number of days since an epoch. Check the following code-

## [1] "Date"
## [1] "double"
## [1] 20070

So, in R versions 4.3.0 and later, it also accepts a number to convert into a date using the number of days elapsed since epoch. So,

## [1] "1970-01-02"

Here date since one day of default origin date 1970-01-01 has been returned. However, default origin can be changed using the argument origin. Example,

# One day since launch of R version 1.0
as.Date(1, origin = "2000-02-29")
## [1] "2000-03-01"

24.1.2 Times

Now, to create date-time objects we can use either of POSIXct and POSIXlt classes, i.e. by using as.POSIXct() and/or as.POSIXlt() functions, as shown below.

as.POSIXct("1990/2/17 12:20:05") # default format
## [1] "1990-02-17 12:20:05 IST"
as.POSIXct("17-2-1990 15:30:00", format = "%d-%m-%Y %H:%M:%S")
## [1] "1990-02-17 15:30:00 IST"
as.POSIXlt("17-2-1990 15:30:00", format = "%d-%m-%Y %H:%M:%S")
## [1] "1990-02-17 15:30:00 IST"

As regards, the difference between these two classes, POSIXct stores seconds since UNIX epoch (and other information), and POSIXlt, which stores a list of day, month, year, hour, minute, second, etc., which can be understood from the following codes.

ct in POSIXct stands for calendar time whereas, lt in POSIXlt stands for local time, explaining the conceptual difference.

time_ct <- as.POSIXct("17-2-1990 15:30:00", format = "%d-%m-%Y %H:%M:%S")
class(time_ct)
## [1] "POSIXct" "POSIXt"
unclass(time_ct)
## [1] 635248800
## attr(,"tzone")
## [1] ""
## POSIXlt
# Convert above object into POSIXlt type
time_lt <- as.POSIXlt(time_ct)
# Let us print
time_lt
## [1] "1990-02-17 15:30:00 IST"
class(time_lt)
## [1] "POSIXlt" "POSIXt"
unclass(time_lt)
## $sec
## [1] 0
## 
## $min
## [1] 30
## 
## $hour
## [1] 15
## 
## $mday
## [1] 17
## 
## $mon
## [1] 1
## 
## $year
## [1] 90
## 
## $wday
## [1] 6
## 
## $yday
## [1] 47
## 
## $isdst
## [1] 0
## 
## $zone
## [1] "IST"
## 
## $gmtoff
## [1] 19800
## 
## attr(,"tzone")
## [1] ""      "IST"   "+0630"
## attr(,"balanced")
## [1] TRUE

Similar to as.Date(), as.POSIXct() also accepts a number and converts it into a date-time object using that much number of seconds since epoch.

## [1] "1970-01-01 05:31:00 IST"

24.1.3 Times (without dates)

In base R, we do not have specific class to deal with time objects. However, there is a package hms part of tidyverse which can be used to create time objects to perform required calculations.

hms::as_hms(10)
## 00:00:10

24.1.4 Timespan

In base R, besides above date/date-time object classes, there is a special class which creates time difference between two given temporal objects in specified units. The class is difftime.

(freedom_age <- Sys.Date() - as.Date("1947-08-15"))
## Time difference of 28245 days
class(freedom_age)
## [1] "difftime"
unclass(freedom_age)
## [1] 28245
## attr(,"units")
## [1] "days"

Function difftime can create difftime objects, as per specific requirement due to presence of an argument units which can take values from one of “auto”, “secs”, “mins”, “hours”, “days”, “weeks”.

difftime(as.Date("2020-02-29"), as.Date("2019-02-28"), units = "weeks")
## Time difference of 52.28571 weeks

We can also create a difftime object by coercion.

a_difftime <- as.difftime(15, units = "days")
a_difftime
## Time difference of 15 days
class(a_difftime)
## [1] "difftime"

24.1.5 Time zones

Run the function Sys.timezone() in your console to check the current timezone. Function OlsonNames() will however, display the known location for time-zones.

## [1] "Asia/Calcutta"
OlsonNames()[c(253, 284)]
## [1] "Asia/Calcutta" "Asia/Kolkata"

So, we may use tz argument of as.POSIXlt() and as.POSIXct() functions to coerce the numeric or other class variables to dates and/or times.

as.POSIXct(1, tz = "GMT")
## [1] "1970-01-01 00:00:01 GMT"

24.1.6 Coercion

In earlier sections we have already seen coercing functions useful to coerce the objects from one class to another.

24.1.7 Extracting parts from dates/times

Base R provides us some useful function to extract relevant part of objects of dates/times classes.

  • weekdays(x, abbreviate = FALSE) to extract the weekday name. (Output is in character vector)
  • months(x, abbreviate = FALSE) to extract the month name. (Output is in character vector)
  • quarters(x) to extract the quarter. Output is in character vector.
  • julian(x) to extract the days elapsed since origin. Output is in numeric vector.

Examples-

dates_vec <- as.Date(15001 + 1:7)
weekdays(dates_vec)
## [1] "Friday"    "Saturday"  "Sunday"    "Monday"    "Tuesday"   "Wednesday"
## [7] "Thursday"
months(dates_vec)
## [1] "January"  "January"  "January"  "January"  "February" "February" "February"
quarters(dates_vec)
## [1] "Q1" "Q1" "Q1" "Q1" "Q1" "Q1" "Q1"
julian(dates_vec)
## [1] 15002 15003 15004 15005 15006 15007 15008
## attr(,"origin")
## [1] "1970-01-01"

24.1.9 strptime character codes

A few format codes useful to parse date/date-time objects in R, are listed below for ready reference. To know more about these codes, readers can see the output of ?strptime() in their console. Some useful codes have been reproduced in the table 24.1 for ready reference.

Table 24.1: Conversion Specifications for Date/Date-time
Format Meaning
%y (or %Y) Year without (or with) Century
%m Month as decimal number (01-12)
%B (or %b) Month name in full (or abbreviated) in current locale
%d Day of the month (01-31)
%H Hours as decimal number (00-23)
%I Hours as decimal number, in 12 hour format (00-12)
%M Minute as decimal number (00-59)
%S Second as integer (00-61) allowing up to two leap-seconds
%OS Second(s) as decimal number
%T Equivalent to %H:%M:%S
%p AM/PM indicator in the locale.
%A (or %a) Full (or abbreviated) week-day name, in current locale
%j Day of year as decimal number (001–366): For input, 366 is only valid in a leap year.
%u Weekday as a decimal number (1–7, Monday is 1)
%w Weekday as decimal number (0–6, Sunday is 0).
%U Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
%V Week of the year as decimal number (01–53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1.
%W Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
%z Offset from UTC
%Z Time Zone name

24.2 Using lubridate package for parsing/creating dates

Package lubridate which is now part of core tidyverse is extremely helpful package while analysing temporal variables in data. Creating date/date-time objects and converting variables from numeric/character to these types is easier than in base R. Besides, it offers us some other temporal class objects to make our time-series analysis easier.

24.2.1 Date/date-time objects creation

Date and Date-time object classes in lubridate are defined similarly as in base R. So we can create a date (or date-time) using a number of days (or seconds) elapsed since epoch.

library(lubridate, warn.conflicts = FALSE)
# Creation using numbers
as_date(2392)
## [1] "1976-07-20"
as_datetime(206668800)
## [1] "1976-07-20 UTC"
# Converting from one type to another
(date_today <- as_date(now()))
## [1] "2024-12-13"
# Check its class
class(date_today)
## [1] "Date"
# Or see what's all in there
unclass(date_today)
## [1] 20070

24.2.2 Parsing date/date-time objects from character

In lubridate parsing date/date-time objects from character strings is pretty easy as we have a number of functions useful to parse dates/date-times written in any specific locale/order irrespective of the delimiter used to separate different components of date/time therein. In these functions, the orders of the year (y), quarter (q), month (m) date (d) hour (h) minute (m) and second (s) are represented by their first characters. These are -

  • ymd_hms, ymd_hm, ymd_h, ymd
  • dmy_hms, dmy_hm, dmy_h, dmy
  • mdy_hms, mdy_hm, mdy_h, mdy
  • ydm_hms, ydm_hm, ydm_h, ydm
  • myd, dym
  • yq, ym, my

A few of the examples are -

ymd("19760720")
## [1] "1976-07-20"
dmy("01.12.2004")
## [1] "2004-12-01"
dmy("15th of January, 2006")
## [1] "2006-01-15"
mdy_hm("August 15th, 1947 at 10:45 PM")
## [1] "1947-08-15 22:45:00 UTC"
my("04-2006")
## [1] "2006-04-01"
yq("2024: Quarter 4")
## [1] "2024-10-01"

To parse date-time from a fraction of year passed, we can use date_decimal(). Example-

date_decimal(2024.162)
## [1] "2024-02-29 07:00:28 UTC"

24.2.3 Parsing dates/date-times from individual conponents

For the scenario, where we have to create date/date-times we can make use of two functions make_date and make_datetime. Syntax is

make_datetime(
  year = 1970L,
  month = 1L,
  day = 1L,
  hour = 0L,
  min = 0L,
  sec = 0,
  tz = "UTC"
)

make_date(year = 1970L, month = 1L, day = 1L)

Example-

make_date(year = 1947, month = 8, day = 15)
## [1] "1947-08-15"

24.3 Extracting and setting date/date-time components

24.3.1 Extraction

We can extract the specific components from date/date-times using, intuitively named accessor functions, which are listed below. Note that in all of these functions the individual date components is in singular; as plural component functions will be used in different context, as we will discuss in Section 24.4.1.

See following examples.

# Make an example date
(a_time <- as_datetime(999999999))
## [1] "2001-09-09 01:46:39 UTC"
# Extract the DAY of the month
day(a_time)
## [1] 9
# Extract Weekday name in full
wday(a_time, label = TRUE, abbr = FALSE)
## [1] Sunday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
# Extract second component
second(a_time)
## [1] 39
# Extract Month name (abbreviated)
month(a_time, label = TRUE, abbr = TRUE)
## [1] Sep
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
# Extract week number of the year
week(a_time)
## [1] 36

Besides above functions, there are a few functions which are helpful in knowing certain characteristics of date/date-time variable. These functions return which logical vectors, are listed as -

  • am(x), pm(x) to know whether the x is in AM or PM, respectively.
  • dst(x) to know whether x is in Daylight savings.
  • leap_year(x) to know whether x is a leap year.

Examples-

# Is our a_time AM?
am(a_time)
## [1] TRUE
# Is a_time falling in leap year?
leap_year(a_time)
## [1] FALSE
# Is it during DST?
dst(a_time)
## [1] FALSE

24.3.2 Setting Components

Just like we used accessor functions to extract the date-time components, we can use those functions to set specific component in date/date-time object/variable. See Example-

# Initial variable
a_time
## [1] "2001-09-09 01:46:39 UTC"
# Setting component - Year
year(a_time) <- 2024
# Print modified variable
a_time
## [1] "2024-09-09 01:46:39 UTC"
# Modify Time zone
tz(a_time) <- "Asia/Kolkata"
# print modified variable
a_time
## [1] "2024-09-09 01:46:39 IST"

Finally, function update can also be used to return a date with the specified elements updated. Example-

EOD <- dmy("31-01-2024")
EOD <- update(EOD, year = 2020)
EOD
## [1] "2020-01-31"
# If values are too big, they will roll-over:
update(EOD, month = 2)
## [1] "2020-03-02"

24.4 Time spans

When we subtract a date-time object from another in base R, we get a difftime object, as we have already seen in section 24.1.4.

We know that every time/date unit is not same in strict sense. Like leap years may have 366 days whereas others have 365 days. Now there is a concept of leap seconds too. Moreover, certain countries have daylight saving time and thus, every day is not of equal length there. In order to deal with specific and clear requirements on date calculations, there are three different classes in lubridate, as discussed next.

Lubridate has three classes, which may sound similar at first, but are different in their working. These are -

  • period
  • duration
  • interval

Let us discuss about each one separately.

24.4.1 Periods

The period objects track changes in clock times and ignore time line irregularities. So every time object is of standard length like our clocks.

Period objects can be created in lubridate using pluralized date component functions. Example-

(one_year <- years(1))
## [1] "1y 0m 0d 0H 0M 0S"
class(one_year)
## [1] "Period"
## attr(,"package")
## [1] "lubridate"
# Add one year PERIOD to get a leap year?
a_date <- dmy("01-03-2019")
a_date + years(1)
## [1] "2020-03-01"

24.4.2 Durations

Durations, on the other hand track changes in physical time, i.e. taking into account all the timeline adjusting irregularities. These duration objects can be created in lubridate, by adding d as prefix to all pluralized period objects.

(one_year_duration <- dyears(1))
## [1] "31557600s (~1 years)"
class(one_year_duration)
## [1] "Duration"
## attr(,"package")
## [1] "lubridate"
# Add one year DURATION to get a leap year date?
a_date + dyears(1)
## [1] "2020-02-29 06:00:00 UTC"

24.4.3 Intervals

The objects of class interval represent specific interval on the timeline. In other words, these have a specific start and end date/datetimes. These can be created in lubridate using function interval, which has syntax like -

interval(start = NULL, end = NULL, tzone = tz(start))

This should be clear with the following examples.

(a_interval <- interval(dmy("15-08-1947"), today()))
## [1] 1947-08-15 UTC--2024-12-13 UTC
class(a_interval)
## [1] "Interval"
## attr(,"package")
## [1] "lubridate"

Intervals can also be created using special operator %--%. E.g.

(interval1 <- dmy("15-08-1947") %--% dmy("26-01-1950"))
## [1] 1947-08-15 UTC--1950-01-26 UTC

24.5 Performing calculations on intervals/dates

There are a few functions to make our life easier while performing data analysis on temporal fields. These are -

24.5.1 %within% operator

Operator a %within% b checks whether interval/date-time a falls with interval b. Returns boolean value(s).

interval1 %within% a_interval
## [1] TRUE

24.5.2 Backward intervals

Intervals in lubridate can be backwards too. E.g.

(back_interval <- dmy("26-01-2024") %--% dmy("15-01-2024"))
## [1] 2024-01-26 UTC--2024-01-15 UTC

24.5.3 Flipping intervals

Function int_flip() can flip the interval. E.g.

int_flip(back_interval)
## [1] 2024-01-15 UTC--2024-01-26 UTC

24.5.4 Checking alignment of two intervals

Function int_aligns tests if two intervals share an endpoint. The direction of each interval is ignored. In other words, it actually tests whether the earliest or latest moments of each interval occur at the same time. E.g.

int1 <- interval(ymd("2001-01-01"), ymd("2002-01-01"))
int2 <- interval(ymd("2001-06-01"), ymd("2002-01-01"))
int3 <- interval(ymd("2003-01-01"), ymd("2004-01-01"))

int_aligns(int1, int2)
## [1] TRUE
int_aligns(int1, int3)
## [1] FALSE

24.5.5 Checking Overlap in two intervals

Function int_overlaps can test if two intervals overlap each other.

int_overlaps(int1, int2)
## [1] TRUE

24.5.6 Length of the interval

Function int_length() can calculate the length of interval and returns a numeric variable equal to the seconds in that interval. E.g.

int_length(back_interval)
## [1] -950400

24.5.7 Adding Months without exceeding last day of the month.

Operators %m+% and %m-% will add (or subtract) months to a date without exceeding the last day of the new month. E.g.

(leap <- ymd("2012-02-29"))
## [1] "2012-02-29"
leap %m+% years(1)
## [1] "2013-02-28"
leap %m+% years(-1)
## [1] "2011-02-28"
leap %m-% years(1)
## [1] "2011-02-28"

Another Example-

jan <- ymd_hms("2010-01-31 03:04:05")
jan + months(1:3) # Feb 31 and April 31 returned as NA
## [1] NA                        "2010-03-31 03:04:05 UTC"
## [3] NA
# NA "2010-03-31 03:04:05 UTC" NA
jan %m+% months(1:3) # No rollover
## [1] "2010-02-28 03:04:05 UTC" "2010-03-31 03:04:05 UTC"
## [3] "2010-04-30 03:04:05 UTC"

24.5.8 Adding with Rollback

One more function add_with_rollback() which performs similarly, but has more control due to specific syntax-

add_with_rollback(e1, e2, roll_to_first = FALSE, preserve_hms = TRUE)

Example-

x <- ymd_hms("2019-01-29 01:02:03")
add_with_rollback(x, months(1))
## [1] "2019-02-28 01:02:03 UTC"
add_with_rollback(x, months(1), preserve_hms = FALSE)
## [1] "2019-02-28 UTC"
add_with_rollback(x, months(1), roll_to_first = TRUE)
## [1] "2019-03-01 01:02:03 UTC"
add_with_rollback(x, months(1), roll_to_first = TRUE, preserve_hms = FALSE)
## [1] "2019-03-01 UTC"

24.5.9 Coercing one time span unit to another

Time-span objects in lubridate can be coerced from one to another using coercing functions,

  • as.period(x, unit)
  • as.duration(x)
  • as.interval(x, start)
  • make_difftime(x)

Examples-

# With Period - clock time
(per <- days(31))
## [1] "31d 0H 0M 0S"
(int1 <- as.interval(per, dmy("01022020")))
## [1] 2020-02-01 UTC--2020-03-03 UTC
# With Duration - physical time
(dur <- ddays(31))
## [1] "2678400s (~4.43 weeks)"
(int2 <- as.interval(dur, dmy("01022020")))
## [1] 2020-02-01 UTC--2020-03-03 UTC

24.6 Rounding date-time variables

There are dedicated functions to round the dates as per specific requirements.

  • floor_date() takes a date-time object and rounds it down to the nearest boundary of the specified time unit.
  • ceiling_date() takes a date-time object and rounds it up to the nearest boundary of the specified time unit.
  • round_date() takes a date-time object and time unit, and rounds it to the nearest value of the specified time unit.

Examples-

x <- ymd_hms("2009-08-03 12:01:59.23")
round_date(x, "month")
## [1] "2009-08-01 UTC"
round_date(x, "week")
## [1] "2009-08-02 UTC"
floor_date(x, "day")
## [1] "2009-08-03 UTC"
ceiling_date(x, "month")
## [1] "2009-09-01 UTC"

Three other functions helps rolling a date forward or backwards, as

  • rollbackward() changes a date to the last day of the previous month or to the first day of the month.
  • rollforward() rolls to the last day of the current month or to the first day of the next month. Optionally, the new date can retain the same hour, minute, and second information.
  • rollback() is a synonym for rollbackward().

See these examples-

date <- ymd_hms("2010-03-03 12:44:22")

rollbackward(date)
## [1] "2010-02-28 12:44:22 UTC"
rollbackward(date, roll_to_first = TRUE)
## [1] "2010-03-01 12:44:22 UTC"
rollbackward(date, preserve_hms = FALSE)
## [1] "2010-02-28 UTC"
rollbackward(date, roll_to_first = TRUE, preserve_hms = FALSE)
## [1] "2010-03-01 UTC"
## [1] "2010-03-31 12:44:22 UTC"
rollforward(date, roll_to_first = TRUE)
## [1] "2010-04-01 12:44:22 UTC"

24.7 Representing date and date-times in cutomised formats

In all of the above sections, we saw that once R recognises a temporal object, it depicts that objects in a uniform format. See

(date1 <- dmy("01-01-2020"))
## [1] "2020-01-01"
(date2 <- ymd("2020/01/01"))
## [1] "2020-01-01"
## [1] "2020-01-01"

In the output/console all dates print like a character/string. However, sometimes requirement is to have print dates in a specific customised format. In this scenario, function strftime() comes to our rescue which converts objects of classes “POSIXlt” and “POSIXct” representing calendar dates and times to specific character representation. Character codes can be used from the table 24.1 in section 24.1.9.

See following examples-

strftime(Sys.Date(), format = "%d %B %Y")
## [1] "13 December 2024"

In this context, let’s also discuss about a special date stamping function in lubridate which can format date/time outputs based on human friendly formats. Functions stamp(), stamp_date and stamp_time() will create a function from the given template, which can be applied to date/time objects to re-format them. Example-

eclipse_dates <- dmy(c("11-7-2010", "13-11-2012", "3-11-2013"))
eclipse_stamp <- stamp_date("There was a solar eclipse on January 13th, 1999")
eclipse_stamp(eclipse_dates)
## [1] "There was a solar eclipse on July 11th, 2010"    
## [2] "There was a solar eclipse on November 13th, 2012"
## [3] "There was a solar eclipse on November 03th, 2013"

24.8 Dates in GGPLOT2 visualisations

Before concluding the chapter on date and time variables, it is important to learn the related functions to deal with impact of date and time formats on visualisations. Since we have already seen that, date and date-time variables are actually continuous variables with labels depicted in a specific formats, we have two scale functions in ggplot2 to deal with -

In these functions, we have date_breaks and date_minor_breaks arguments to position the breaks by date units. E.g. date_breaks = "6 months" will place major tick mark every six months.

Using another argument date_labels we can control the display of the labels in the plots. The values of the argument may be in strptime formats as we have already seen in 24.1 above.

Example-

library(tidyverse, warn.conflicts = FALSE)

economics %>% 
  ggplot(aes(date, uempmed)) +
  geom_line() +
  scale_x_date(date_labels = "%Y", 
               date_breaks = "5 years", 
               date_minor_breaks = "1 year") +
  labs(x = "", y = "")
Use of Date Time Scale in ggplot2

Figure 24.1: Use of Date Time Scale in ggplot2