Group data by month in R
Published on
I often analyze time series data in R — things like daily expenses or webserver statistics. And just as often I want to aggregate the data by month to see longer-term patterns.
Doing this in base R is a bit awkward, and the internet is littered
with working but terrible solutions using strftime
or
similar date-to-string conversions.
The real gem is the function floor_date
from the
lubridate
package. I found out about it from this old StackOverflow
answer by Hadley Wickham. As the name implies, it can be used to
round each date down to the month boundary, so that dates in the same
month are rounded down to the same date.
Let’s look at an example.
library(dplyr)
library(lubridate)
set.seed(2017)
options(digits=4)
Say these are your daily expenses for 2016:
<- data_frame(
(expenses date=seq(as.Date("2016-01-01"), as.Date("2016-12-31"), by=1),
amount=rgamma(length(date), shape = 2, scale = 20)))
## # A tibble: 366 × 2
## date amount
## <date> <dbl>
## 1 2016-01-01 75.42
## 2 2016-01-02 28.14
## 3 2016-01-03 51.12
## 4 2016-01-04 26.12
## 5 2016-01-05 42.09
## 6 2016-01-06 40.99
## 7 2016-01-07 45.59
## 8 2016-01-08 57.55
## 9 2016-01-09 22.98
## 10 2016-01-10 14.50
## # ... with 356 more rows
Then you can summarize them by month like this:
%>% group_by(month=floor_date(date, "month")) %>%
expenses summarize(amount=sum(amount))
## # A tibble: 12 × 2
## month amount
## <date> <dbl>
## 1 2016-01-01 1200.9
## 2 2016-02-01 1002.9
## 3 2016-03-01 1237.6
## 4 2016-04-01 1120.8
## 5 2016-05-01 1276.6
## 6 2016-06-01 1404.6
## 7 2016-07-01 972.6
## 8 2016-08-01 1245.7
## 9 2016-09-01 1020.2
## 10 2016-10-01 986.2
## 11 2016-11-01 1106.3
## 12 2016-12-01 1235.2
floor_date
lets you round dates to various time periods
from seconds to years and also multiples of these periods, e.g.
%>% group_by(month=floor_date(date, "14 days")) %>%
expenses summarize(amount=sum(amount))
## # A tibble: 36 × 2
## month amount
## <date> <dbl>
## 1 2016-01-01 550.0
## 2 2016-01-15 462.1
## 3 2016-01-29 188.8
## 4 2016-02-01 568.2
## 5 2016-02-15 335.1
## 6 2016-02-29 99.6
## 7 2016-03-01 675.7
## 8 2016-03-15 459.8
## 9 2016-03-29 102.2
## 10 2016-04-01 458.3
## # ... with 26 more rows