Store Sales - Time Series Forecasting¶

Overview¶

The Program is divided into several sections:

  1. Analyzing & Visualizing Data
  2. Other Impact Factors
  3. A Simple Model
  4. An Enhanced Model
  5. Hybrid Models
  6. Conclusion

Introduction¶

In general, there are four components of movement of the time series and each of them presents a distinct aspect of the behavior of the time series. They are:

  • Secular trend: a statistical tendency that can be easily identified as an overall long period.

  • Seasonality: such as monthly or weekly frequency of repetition.

  • Cyclical fluctuations: time series repeats periodically but not seasonal variations;

  • Irregular variations: which are other nonrandom sources of variations of series, such as on holidays, events.

In this project, We will use time-series forecasting to forecast store sales on data from Corporación Favorita, a large Ecuadorian-based grocery retailer by building a model that more accurately predicts the unit sales for thousands of items sold at different Favorita stores.

To improve the accuracy, we might evaluate and consider external impacts such as:

  1. Holidays and Events.

  2. Dates to pay wages in the public sector.

  3. Major crisis (A magnitude 7.8 earthquake struck Ecuador on April 16, 2016).

  4. Daily oil price (Ecuador is an oil-dependent country).

Links¶

View interactive visualization at Kaggle:¶

https://www.kaggle.com/linhhlp/store-sales-time-series-forecasting-full-analysis/

Database link on Kaggle

Full code hosted in GitHub

Benefits to having a predictive model¶

  1. Forecasts are especially relevant to brick-and-mortar grocery stores, which must dance delicately with how much inventory to buy. Predict a little over, and grocers are stuck with overstocked, perishable goods. Guess a little under, and popular items quickly sell out, leading to lost revenue and upset customers. More accurate forecasting, thanks to machine learning, could help ensure retailers please customers by having just enough of the right products at the right time.

  2. Current subjective forecasting methods for retail have little data to back them up and are unlikely to be automated. The problem becomes even more complex as retailers add new locations with unique needs, new products, ever-transitioning seasonal tastes, and unpredictable product marketing.

  3. More accurate forecasting can decrease food waste related to overstocking and improve customer satisfaction. The results of this ongoing competition, over time, might even ensure your local store has exactly what you need the next time you shop.

Setup¶

Clone or download the repo¶

First get local copies of the program:

$ git clone https://github.com/linhhlp/Store-Sales-Time-Series-Forecasting-Kaggle.git

Or download from: https://github.com/linhhlp/Store-Sales-Time-Series-Forecasting-Kaggle/archive/main.zip

Install the dependencies¶

This program has been developed and tested on:

  • python 3.9.10
  • pandas 1.4.1
  • notebook 6.4.8
  • numpy 1.22.2
  • tensorflow 2.6.0
  • sklearn 1.0.2
  • matplotlib: 3.5.1
  • seaborn 0.11.2
  • statsmodels 0.13.2
  • learntools by Kaggle but modified

The quickest, easiest way to install is to use Anaconda:

Installing with anaconda¶

Install anaconda

The quickest, easiest way to install dependencies is to use the command line to create an environment and install the packages:

$ conda env create
$ source activate new_env

Install the remaining dependencies with:

conda install tensorflow sklearn seaborn

References:¶

I would like to thank this people and other sources I have learned to complete this comprehensive model.

  1. Kaggle: A very useful resouce with multiple and various types of content, from learning lession to free codes by other peoples.

  2. https://www.kaggle.com/ekrembayar/store-sales-ts-forecasting-a-comprehensive-guide/notebook

  3. https://www.kaggle.com/code/kashishrastogi/store-sales-analysis-time-serie