SALES FORECASTING & DATA ENRICHMENT

This is a basic machine learning project that aims to predict the next three months of demand of an item in a store using the historical data of past 5 years. The problem statement used in this project is from a Kaggle competition named "Store Item Demand Forecasting Challenge".

WHAT DOES THE DATASET CONTAIN?

The dataset contains five years of store item sales data. This is time-series data as the sales are dependent on time. It is structured in a tabular format with four columns:

1st column: date
2nd column: store id
3rd column: item id (referring to the id of each item within the store)
4th column: the number of times the item has been sold (a particular item has been sold X number of times in Y particular store on Zth date)

WHAT ARE WE GOING TO DO WITH WITH THIS DATASET?

Based on five years of sales data from 10 different stores, this project aims to predict the next three months of sales for each individual item in these stores. By analyzing historical trends, the model will forecast future demand to aid in inventory management and sales planning.

WHAT ARE WE USING TO SOLVE THIS PROBLEM?

CAT BOOST: It is an algorithm for gradient boosting on decision trees. It is a very popular machine learning algorithm which is used in recommendation systems and forecasting.
UPGINI: Upgini is a Python library that helps achieve highly accurate forecasting models. The data we have is sparse, with only two main features: the date of sales and the number of sales. It is not a lot of information for our machine learning model to understand how to perform the prediction process for the sales of various items. Upgini solves this problem by automatically searching through thousands of public data sources to find the most relevant features.It then integrates these features with our existing dataset, improving the model's performance.
PANDAS: Handle dataframes by downloading a CSV file, converting it into a pandas dataframe, and then feeding it into our model.

IDE TO USE?

Google Colab
Jupyter notebook (or)
Any other IDE you like

HOW TO INSTALL THE LIBRARIES?

catboost - pip install catboost
upgini - pip install upgini (only works with python version >=3.7 and < 3.10)
pandas - pip install pandas

TASK TYPE - REGRESSION

STEPS

Install the libraries.
Download the dataset and prepare the input data.
Split the dataset into test and train sets.
Split the datasets into features (input values) and labels (what we want to predict)
Enrich the features using upgini library to get relevant features and their corresponding SHAP value (It is a mathematical value that indicates how relevant or how influential this feature is towards the prediction.)
Defining the catboost model
Adding the new features to the original dataset
Model's performance and evaluation under:
- original dataset without any enrichment
- newly formed enriched dataset

NOTE

One might encounter an error while adding new features to the original dataset when selecting a random sample whose size is greater then 10,000 in step-2. This is due to the row limit imposed by the trial version of Upgini. The trial version only allows you to enrich up to 10,000 rows. To resolve this, you have a few options:

Reduce the number of rows to 10,000 or fewer: You can sample a smaller subset of your dataset for enrichment to stay within the trial limits.
Upgrade your Upgini account: Consider upgrading to a paid plan if you need to enrich more than 10,000 rows.
Bypass the enrichment for larger datasets: If you don't need the enrichment, you can proceed without it.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Sales_Forecasting.py		Sales_Forecasting.py
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SALES FORECASTING & DATA ENRICHMENT

WHAT DOES THE DATASET CONTAIN?

WHAT ARE WE GOING TO DO WITH WITH THIS DATASET?

WHAT ARE WE USING TO SOLVE THIS PROBLEM?

IDE TO USE?

HOW TO INSTALL THE LIBRARIES?

TASK TYPE - REGRESSION

STEPS

NOTE

About

Releases

Packages

Languages

SnPreethi/Sales_Forecasting_And_Data_Enrichment

Folders and files

Latest commit

History

Repository files navigation

SALES FORECASTING & DATA ENRICHMENT

WHAT DOES THE DATASET CONTAIN?

WHAT ARE WE GOING TO DO WITH WITH THIS DATASET?

WHAT ARE WE USING TO SOLVE THIS PROBLEM?

IDE TO USE?

HOW TO INSTALL THE LIBRARIES?

TASK TYPE - REGRESSION

STEPS

NOTE

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages