Skip to content

Recommendation system implementation with application to the RecSys 2022 Dressipi challenge. Analysis, different process and alteration of the data, machine learning technique to generate submissions.

Notifications You must be signed in to change notification settings

SCIA-Premium/RecSys_2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Syma - RecSys 2022 Challenge Profile

Objective

The objective of the project is to improve a recommendation system with application to the dressipi challenge. The goal was to obtain the best possible score on the online challenge, and to try different way to have a good score. The notebook go through our analysis of the data, then on the different process and alteration that we have done to produce our final dataset, and finally on our different machine learning technique to generate our submissions.

Dataset

For this challenge we had access to multiple datasets that gave us many information about the behaviour of the customers.

There were 5 datasets :

  • candidate_items.csv: contains all the items available
  • item_features.csv: contains all the features of each item
  • train_purchases.csv: contains all the purchases that occurred at the end of a session.
  • train_sessions.csv: contains all the items viewed in a session for each session_id
  • test_leaderboard_sessions: contains the input sessions for the leader-board

Data pre-processing

After this analyse, we had to pre-process the data in order to use them in our recommendation system. The items used 73 category of features. Even if 73 category is not a lot, it is still a big number and we had to reduce the dimension of our data to apply ML algorithm later.

We also analized how the sessions are represented in the train_sessions dataset, here is one of the example plot that we could produce :

Range plot

We used a truncated SVD to reduce our items sparse matrix to 12 components. This matrix allowed us to find easily and faster the embedding items of each items. We had just to compare the value of their components in the matrix.

Machine Learning

For the machine learning part, we used two different models to generate our submission. The first one is a logistic regression and the second one is a simple RNN (Recursive Neural Network).

When evaluating the models on a test dataset, the logistic regression gave us a decent accuracy score of 79,99% , whereas the RNN gave us a quite better score of 80,91%.

To improve our results, we thought that removing some features could be a solution. So we decided to use only the 15 most used features category and retry our experimentation. We got as result an accuracy of 80,00% with the logistic regression and 80,92% with the RNN. We concluded that it does not really improve the performance of our predictions, because it could be based on the sample used as data, that is generated randomly.

About

Recommendation system implementation with application to the RecSys 2022 Dressipi challenge. Analysis, different process and alteration of the data, machine learning technique to generate submissions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published