Skip to content

Use of Machine Learning techniques to create a recommender system

Notifications You must be signed in to change notification settings

TolaOgunniyi/Market-Basket-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Market-Basket-Analysis

Project title: Use of machine learning techniques to create a recommender system.


Date completed: March 26, 2020.

Introduction:

A key strategy for large retailers is finding the association between different items/products that are purchased by customers. Market Basket analysis lends itself to this particular goal via rules-based learning (i.e. associations rules mining). Some of the goals that Market basket analysis can help retailers achieve are listed below:

  • Recommend products.
  • Plan a store layout.
  • Design sales promotions that combine discounted and marked up items.
  • Dicover trigger products(products which when bought together, trigger other purchases).

Dataset

I used an Amazon electronics review dataset for my capstone project that I found on Kaggle. The dataset contained over 1,000,000 rows. I extracted 600,000 rows of the dataset for my capstone project using command line as shown in the image below.

top terms

Command line script to extract 600,000 rows of data I wanted for my capstone project

----

Jupyter notebook

The project consists of two parts listed below. I launched a jupyter notebook instance on Amazon Sagemaker to complete work for the project with the exception of a graph that was created using Gephi.

Part 1:

  • Exploratory data analysis (EDA)

Part 2:

  • Modeling and extraction of .csv file for network analysis.

  • The rules learned from the Market Basket analysis were further processed to create two .csv files (node and edge) used to construct the graph shown below. It is a directed graph and 5 clusters were successfully identified. The name of the different products in the graph was entered manually as the dataset only provided the ASIN (Amazon Standard Identification number) code. I did not find the name for product with ASIN code B000056SSM, and decided to leave it as it is on the graph.

  • The graph is based on a dataset in which users provided a rating for the item they have purchased. As a result,the different clusters do provide an insight into the preference(s) of a customer(s) when purchasing electronic products on Amazon.

top terms

Network analysis based on associations rules mining

References


Thank you very much for taking the time to look at this project. Please feel free to contact me via email([email protected]) or linkedIn if you have any questions,comments or feedback.