GitHub - yasarigno/Categorization_via_Deep_Learning_and_NLP: Recognition of categories of products from images and textual descriptions.

Recognition of categories of products from images and textual descriptions.

Click here for the presentation file of the project.

We are given a dataset which consists of pictures of goods and their descriptions. The csv file contains some other information such as the price of the product, the name of the product and its brand. There is as well a variable "product_category_tree" which defines categories and 6 subcategories of the product. This variable is defined manually by the sellers. As the size of our dataset grows up drastically, the task of associating the product to the category will be a burden. Therefore we must automatize this task by using only the pictures and the descriptions. Now the problem that we want to solve is converted into a problem of Natural Language Processing (NLP) and that of Computer Vision (CV).

We approach to this problem of recognition of categories from different aspects. First we use only the descriptions and perform algorithms of NLP, then in later notebooks we take the tools of CV into account.

Let us see what we have as input and how it looks like. For instance, the first product is a purple curtain.

And its description is:

In this project we study the feasibility of an engine for classifying articles into different categories, with a sufficient level of precision.

Data source:

https://s3-eu-west-1.amazonaws.com/static.oc-static.com/prod/courses/files/Parcours_data_scientist/Projet+-+Textimage+DAS+V2/Dataset+projet+prétraitement+textes+images.zip

DATA
number of lines	1050
number of columns	15

There are 6 notebooks in this project.

Notebook 1 : Text mining, Unsupervised models such as TF-IDF, Word embedding via GloVe, Neural networks, Natural Language Processing. We do not use the set of pictures in this Notebook. The steps of transformations on corpus look like this:

Notebook 2 : Computer Vision via OpenCV. We test the algorithms SIFT and ORB. The dimensionality reduction method that we use here is t-SNE. The results are weak. So we pass to other strategies in the following notebooks. We transform the images as the figures below show:

Notebook 3 : Convolutional Neural Networks (CNN). We deep dive into deep learning. The first strategy concerns using CNN. Recall that the last layer called fully-connected classifies the input image of the network. After testing it, we delete this last layer and replace other classifiers. This strategy gives us opportunity to test algorithms like KNN and Random Forest.

Notebook 4 : Transfer Learning. We use VGG-16 model to classify the images. We test data augmentation.

Notebook 5 : Multi-model. Use of both textual and visual data to built a multi-model.

We have tested both supervised and unsupervised Machine Learning algorithms in this project. Transfer learning shows its power and gives the best results.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
support		support
LICENSE		LICENSE
P6_01_NLP.ipynb		P6_01_NLP.ipynb
P6_02_Computer_Vision_via_SIFT_and_ORB.ipynb		P6_02_Computer_Vision_via_SIFT_and_ORB.ipynb
P6_03_01_Computer_Vision_via_Deep_Learning_size_128.ipynb		P6_03_01_Computer_Vision_via_Deep_Learning_size_128.ipynb
P6_03_02_Computer_Vision_via_Deep_Learning_size_224.ipynb		P6_03_02_Computer_Vision_via_Deep_Learning_size_224.ipynb
P6_04_Computer_Vision_via_Transfer_Learning.ipynb		P6_04_Computer_Vision_via_Transfer_Learning.ipynb
P6_05_Final_Multi_modele.ipynb		P6_05_Final_Multi_modele.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recognition of categories of products from images and textual descriptions.

Data source:

About

Languages

License

yasarigno/Categorization_via_Deep_Learning_and_NLP

Folders and files

Latest commit

History

Repository files navigation

Recognition of categories of products from images and textual descriptions.

Data source:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages