Avoid the months() and years() functions from lubridate

Published on

lubridate is an R/tidyverse package for dealing with dates and times.

One of its nice features is the date arithmetics: you can take a date, such as ymd("2020-08-30"), add an interval (such as days(10)), and get a new date ("2020-09-09").

In addition to days(), you can modify dates with months() and years(). However, I would argue you almost never want to use these two functions.

I am writing this on August 30, 2020. I’d like to do some forecasting 6 months ahead. So I would do

> today() + months(6)
[1] NA

The reason is that, strictly speaking, 6 months for now would be February 30, 2021, which does not exist.

Here’s the relevant section of the vignette:

If anyone drove a time machine, they would crash

The length of months and years change so often that doing arithmetic with them can be unintuitive. Consider a simple operation, January 31st + one month. Should the answer be

  1. February 31st (which doesn’t exist)
  2. March 4th (31 days after January 31), or
  3. February 28th (assuming its not a leap year)

A basic property of arithmetic is that a + b - b = a. Only solution 1 obeys this property, but it is an invalid date. I’ve tried to make lubridate as consistent as possible by invoking the following rule if adding or subtracting a month or a year creates an invalid date, lubridate will return an NA. This is new with version 1.3.0, so if you’re an old hand with lubridate be sure to remember this!

If you thought solution 2 or 3 was more useful, no problem. You can still get those results with clever arithmetic, or by using the special %m+% and %m-% operators. %m+% and %m-% automatically roll dates back to the last day of the month, should that be necessary.

jan31 <- ymd("2013-01-31")
jan31 + months(0:11)
#>  [1] "2013-01-31" NA           "2013-03-31" NA           "2013-05-31"
#>  [6] NA           "2013-07-31" "2013-08-31" NA           "2013-10-31"
#> [11] NA           "2013-12-31"
floor_date(jan31, "month") + months(0:11) + days(31)
#>  [1] "2013-02-01" "2013-03-04" "2013-04-01" "2013-05-02" "2013-06-01"
#>  [6] "2013-07-02" "2013-08-01" "2013-09-01" "2013-10-02" "2013-11-01"
#> [11] "2013-12-02" "2014-01-01"
jan31 %m+% months(0:11)
#>  [1] "2013-01-31" "2013-02-28" "2013-03-31" "2013-04-30" "2013-05-31"
#>  [6] "2013-06-30" "2013-07-31" "2013-08-31" "2013-09-30" "2013-10-31"
#> [11] "2013-11-30" "2013-12-31"

Notice that this will only affect arithmetic with months (and arithmetic with years if your start date it Feb 29).

Now, can you think of some applications where this behavior — returning an NA — would be the desired return value of the expression today() + months(6)?

These functions are especially dangerous because they work most of the time. So when you are testing your code, you’ll likely see it working. And then once a couple of months it will fail.

So I am trying to get into the habit of writing days(6*30) instead of months(6). Alternatively, you can try to remember to use %m+% instead of +, although I’m sure you will slip up sooner or later.

The date arithmetic in lubridate is very neat and simple syntactically, and I think it was a mistake to make the most natural-looking construct fail 2% of time. Even removing those functions from the API would be better — as it would make it impossible to write the subtly-incorrect code.