Things are moving along in the COVID19 world in Melbourne, and the latest numbers we are discussing are the 14 day and 7 day averages. The aim is to get the 14 day average below 5 cases, but people are starting to report the current 7 day average, since this is also encouraging and interesting.

So let’s explore how to do sliding averages. We’ll use the covid scraping code from a previous blog post on scraping covid data (I don’t think I’ll put this into yet another R package, but I’m tempted. But…anyway).

This code checks if we can scrape the data (`bow()`

), scrapes the data (`scrape()`

), extracts the tables (`html_table()`

), picks (`pluck`

) the second one, then converts it to a `tibble`

for nice printing.

```
covidlive_url <- "https://covidlive.com.au/report/daily-cases/vic"
covidlive_raw <- covidlive_url %>%
bow() %>%
scrape() %>%
html_table() %>%
pluck(2) %>%
as_tibble()
```

Then we do a bit of data cleaning, parsing the dates and numbers properly, and just leaving us with a date and case column:

```
strp_date <- function(x) as.Date(strptime(x, format = "%d %b"))
daily_cases <- covidlive_raw %>%
mutate(DATE = strp_date(DATE),
CASES = parse_number(CASES),
NET = suppressWarnings(parse_number(NET))) %>%
clean_names() %>%
select(-var,
-cases) %>%
rename(cases = net) %>%
select(date, cases)
daily_cases
#> # A tibble: 272 x 2
#> date cases
#> <date> <dbl>
#> 1 2020-10-22 NA
#> 2 2020-10-21 3
#> 3 2020-10-20 1
#> 4 2020-10-19 4
#> 5 2020-10-18 -2
#> 6 2020-10-17 0
#> 7 2020-10-16 2
#> 8 2020-10-15 4
#> 9 2020-10-14 6
#> 10 2020-10-13 10
#> # … with 262 more rows
```

We can then convert this into a `tsibble`

, to make it easier to work with dates.

```
daily_ts <- as_tsibble(daily_cases,
index = date)
```

## Sliding windows?

No we want to plot a 7 and 14 day average of cases. Thinking about how I would do this, I probably would have identified the “week” of a year, and then grouped by that and calculated the average, and maybe through some `reduce`

/`aggregate`

functional programming magic.

But there is now a more straightforward way, using using the {`slider`

} R package by Davis Vaughn. This package allows for performing calculations on a specified window size. The idea is very powerful, and was in part inspired by the `slide`

family of functions in `tsibble`

.

Earo Wang has given some really nice explanations of what sliding is, in particular I like her JSM19 talk and rstudioconf::2019 talk - a visual representation is in this gif (lifted from Earo’s talk):

slider provides a more general interface, and draws upon the framework in `purrr`

and `vctrs`

R packages.

Let’s show an example by taking the last 14 days of covid cases

```
vec_cases <- daily_ts %>%
tail(15) %>%
pull(cases) %>%
na.omit() %>%
as.numeric()
vec_cases
#> [1] 10 10 12 12 14 10 6 4 2 0 -2 4 1 3
```

We can use `slide`

to calculate the mean of the last 7 days. We can demonstrate how this work by first just printing the data, and using the `.before = 6`

, to print the previous 6 values, plus the current one:

```
slide(.x = vec_cases,
.f = ~.x,
.before = 6)
#> [[1]]
#> [1] 10
#>
#> [[2]]
#> [1] 10 10
#>
#> [[3]]
#> [1] 10 10 12
#>
#> [[4]]
#> [1] 10 10 12 12
#>
#> [[5]]
#> [1] 10 10 12 12 14
#>
#> [[6]]
#> [1] 10 10 12 12 14 10
#>
#> [[7]]
#> [1] 10 10 12 12 14 10 6
#>
#> [[8]]
#> [1] 10 12 12 14 10 6 4
#>
#> [[9]]
#> [1] 12 12 14 10 6 4 2
#>
#> [[10]]
#> [1] 12 14 10 6 4 2 0
#>
#> [[11]]
#> [1] 14 10 6 4 2 0 -2
#>
#> [[12]]
#> [1] 10 6 4 2 0 -2 4
#>
#> [[13]]
#> [1] 6 4 2 0 -2 4 1
#>
#> [[14]]
#> [1] 4 2 0 -2 4 1 3
```

This shows us 14 lists, the first 6 containing 1-6 of the numbers, then 7 from thereout.

We can instead run a function, like `mean`

to calculate the mean on this output.

```
slide(.x = vec_cases,
.f = mean,
.before = 7)
#> [[1]]
#> [1] 10
#>
#> [[2]]
#> [1] 10
#>
#> [[3]]
#> [1] 10.66667
#>
#> [[4]]
#> [1] 11
#>
#> [[5]]
#> [1] 11.6
#>
#> [[6]]
#> [1] 11.33333
#>
#> [[7]]
#> [1] 10.57143
#>
#> [[8]]
#> [1] 9.75
#>
#> [[9]]
#> [1] 8.75
#>
#> [[10]]
#> [1] 7.5
#>
#> [[11]]
#> [1] 5.75
#>
#> [[12]]
#> [1] 4.75
#>
#> [[13]]
#> [1] 3.125
#>
#> [[14]]
#> [1] 2.25
```

We can even use the `slide_dbl`

function to return these as all numeric (the type stability feature borrowed from purrr):

```
slide_dbl(.x = vec_cases,
.f = mean,
.before = 7)
#> [1] 10.00000 10.00000 10.66667 11.00000 11.60000 11.33333 10.57143 9.75000
#> [9] 8.75000 7.50000 5.75000 4.75000 3.12500 2.25000
```

Now let’s use this inside our data, first we filter the data down to from the start of october with `filter_index("2020-10-01" ~ .)`

, then, we calculate the average, using `slide_index_dbl`

, where we specify the time index used in the data with `i`

:

```
covid_rolls <- daily_ts %>%
filter_index("2020-10-01" ~ .) %>%
mutate(`7 day avg` = slide_index_dbl(.i = date,
.x = cases,
.f = mean,
.before = 6),
`14 day avg` = slide_index_dbl(.i = date,
.x = cases,
.f = mean,
.before = 13))
covid_rolls
#> # A tsibble: 22 x 4 [1D]
#> date cases `7 day avg` `14 day avg`
#> <date> <dbl> <dbl> <dbl>
#> 1 2020-10-01 14 14 14
#> 2 2020-10-02 8 11 11
#> 3 2020-10-03 6 9.33 9.33
#> 4 2020-10-04 12 10 10
#> 5 2020-10-05 11 10.2 10.2
#> 6 2020-10-06 13 10.7 10.7
#> 7 2020-10-07 4 9.71 9.71
#> 8 2020-10-08 10 9.14 9.75
#> 9 2020-10-09 10 9.43 9.78
#> 10 2020-10-10 12 10.3 10
#> # … with 12 more rows
```

We convert this into long form for easier plotting

```
covid_rolls_long <- covid_rolls %>%
pivot_longer(cols = 3:4,
names_to = "roll_type",
values_to = "value")
covid_rolls_long
#> # A tsibble: 44 x 4 [1D]
#> # Key: roll_type [2]
#> date cases roll_type value
#> <date> <dbl> <chr> <dbl>
#> 1 2020-10-01 14 7 day avg 14
#> 2 2020-10-01 14 14 day avg 14
#> 3 2020-10-02 8 7 day avg 11
#> 4 2020-10-02 8 14 day avg 11
#> 5 2020-10-03 6 7 day avg 9.33
#> 6 2020-10-03 6 14 day avg 9.33
#> 7 2020-10-04 12 7 day avg 10
#> 8 2020-10-04 12 14 day avg 10
#> 9 2020-10-05 11 7 day avg 10.2
#> 10 2020-10-05 11 14 day avg 10.2
#> # … with 34 more rows
```

Now let’s plot it!

```
ggplot(covid_rolls_long,
aes(x = date,
y = value,
colour = roll_type)) +
geom_line() +
geom_hline(yintercept = 5, linetype = 2) +
lims(y = c(0, 15)) +
scale_colour_discrete_qualitative() +
labs(x = "Date",
y = "Rolling Average",
colour = "Average") +
# make the legend inset using code lifted from
# https://github.com/MilesMcBain/inlegend/blob/master/R/legends.R
theme(legend.justification = c(1, 1),
legend.position = c(1.0, 1),
legend.background = ggplot2::element_rect(
colour = "#d3d5d6",
fill = "#ffffff",
size = 0.6
))
#> Warning: Removed 2 row(s) containing missing values (geom_path).
```

# End

The `slider`

R package is really neat, and there is more to say about it! But I just thought I’d finish by saying that it is indeed possible to do the same “stretch” and “tile” manouevers as provided by `tsibble`

, and I would highly recommend checking out the slider website for more details on examples like rolling linear models and how to use it to perform row wise iteration.