Multiple Time Series Forecast & Demand Pattern Classification using R — Part 3

Photo by Towfiqu barbhuiya on Unsplash

Part 1: Data Cleaning & Demand categorization.

Part 2: Fitting statistical Time Series models (ARIMA, ETS, CROSTON etc.) using fpp3 (tidy forecasting) R Package.

Part 3: Time Series Feature Engineering using timetk R Package.

Part 4: Fitting Machine Learning models (XGBoost, Random Forest, etc.) & Hyperparameter tuning using modeltime & tidymodels R packages.

Part 5: Fitting Deeplearning models (NBeats & DeepAR) & Hyperparameter tuning using modeltime, modeltime.gluonts R packages.

01. Seasonal Features

Why do we need to decompose date?

master_data_tbl %>% # Transform date variable to multiple time features
timetk::tk_augment_timeseries_signature(.date_var = week_date) %>%

# Remove unwanted features
dplyr::select(
-dplyr::matches("(.iso$)|(.xts$)|(day)|(hour)|(minute)|(second)| (am.pm)|(diff)")
)
Seasonality features

02. Calendar events

For example, if an eCommerce business wants to sell something in China and if they want to forecast future sales, they should consider the following calendar events: Chinese New Year, National Day, Mid-Autumn Festival, etc.

Likewise, if the Air Line industry wants to forecast number of passengers, they should consider summer holidays, bank holidays as calendar events.

timetk::tk_make_timeseries(
start_date = as.Date("2021-12-24"),
end_date = as.Date("2022-01-02"),
by = "day"
) %>%
dplyr::as_tibble(.name_repair = ~"date") %>% timetk::tk_augment_holiday_signature(
.date_var = date,
.holiday_pattern = "world_christmas|world_new",
.locale_set = "none",
.exchange_set = "none"
)
Calendar Events

03 Lag Features (Trends)

What is Lag features & Why is it important?

master_data_tbl %>% 
dplyr::group_by(center_id, meal_id) %>%
timetk::tk_augment_lags(num_orders, .lags = 1:3)

When creating the lag features beware of data leakage. Hence, create the lag features on train data to avoid data leakage.

04 Pricing Features

full_tbl <- full_tbl %>% 
group_by(center_id, meal_id) %>%
mutate(cum_mean_base_price = cummean(base_price)) %>%
# Relative difference between the current price of an item and
# its historical average price
mutate(
promotion_impact = (base_price - cum_mean_base_price) / cum_mean_base_price
) %>%
select(-c(cum_mean_base_price, checkout_price)) %>%
ungroup()
store_impact_tbl <- full_tbl %>% 

# Creates list of DF for each meals
group_split(meal_id) %>%

# Map through each Meal DF
map_df(~{

# Calculate average price of a meal on each centres
data <- .x %>%
group_by(center_id, meal_id) %>%
summarise(base_price = mean(base_price, na.rm = TRUE),
.groups = "drop")

# Get list of each Centre id's
center <- unique(data$center_id)

# Map through each Centre Id's
map_df(center, .f = ~{
data %>%
# Dummy variable to make other Centres as a one Centre
mutate(
center_cat = ifelse(center_id == .x, "center", "other")
) %>%
group_by(meal_id, center_cat) %>%

# Calculate average price of Centre and Other Centres
summarise(
base_price = mean(base_price, na.rm = T),
.groups = "drop"
) %>%

# Calculate Relative price difference
pivot_wider(
id_cols = meal_id,
names_from = center_cat,
values_from = base_price
) %>%
mutate(
store_impact = (center - other_center)/other_center,
center_id = .x
) %>%
filter(!is.na(store_impact)) %>%
select(-c(center, other_center))
})
})
cannabilization_tbl <- full_tbl %>% 

# Creates list of DF for each centres
group_split(center_id) %>%

# Map through each Centre DF
map_df(~{

# Calculate average price of meals in centres
data <- .x %>%
group_by(meal_id) %>%
summarise(base_price = mean(base_price, na.rm = TRUE),
.groups = "drop")

# Get list of each meals
meal <- unique(data$meal_id)
# Map through each Meal Id's
map_df(meal, .f = ~{
data %>%
# Create a dummy variable to make other Meals as a one Meal
mutate(
meal_cat = ifelse(meal_id == .x, "meal", "other_meal")
) %>%
group_by(meal_cat) %>%
# Calculate average price of the Meal and Other Meals
summarise(
base_price = mean(base_price, na.rm = T),
meal_id = .x,
.groups = "drop"
) %>%
# Calculate Relative price difference
pivot_wider(
id_cols = meal_id,
names_from = meal_cat,
values_from = base_price
) %>%
mutate(
cannabilization = (meal - other_meal)/other_meal
) %>%
select(-c(meal, other_meal))
}) %>%
mutate(center_id = unique(.x$center_id))
})

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store