Code
library(tidyverse)
library(widyr)
Using widyr package: A Simple Step-by-Step Tutorial
Zahier Nasrudin
July 4, 2024
Purpose of the tutorial: To demonstrate a quick and straightforward implementation of time series clustering using the widyr
package in R
What is time series clustering?: Grouping time series data into clusters where data points in the same cluster group are more similar to each other than to those in other clusters. For example, if we have monthly sales data, time series clustering can help identify stores with similar sales patterns over time.
About the data:
Fake dataset that can be downloaded from my GitHub.
Contains 832 rows & 3 columns
Columns:
year
(<date>
): Date information for each observation.
storecode
(<chr>
): Unique identifier for each store.
sales
(<dbl>
): Sales figures for each store.
Importing data: Using read_csv()
year | storecode | sales |
---|---|---|
2022-12-01 | A4P1Q1 | 22432 |
2023-01-01 | A4P1Q1 | 22425 |
2023-02-01 | A4P1Q1 | 20710 |
2023-03-01 | A4P1Q1 | 23054 |
2023-04-01 | A4P1Q1 | 23912 |
2023-05-01 | A4P1Q1 | 22782 |
widyr
Using widely_kmeans
for time series clustering:
Define item
:
storecode
Define feature
:
year
columnDefine value
:
sales
Define k
:
Joining Results: The clustering results are joined back to the original dataset.
ggplot2
.widyr
package in R. Of course, there is much more you can explore and refine in your clustering analysis. For comprehensive documentation and further exploration of the widyr
package, visit the widyr page itself: widyr Documentation.