GitHub - EladGashri/NLP_Algorithms_Clustering_and_Classification

The aim of the project is to compare the difference between 2 known Natural Language Processing (NLP) algorithms in regards to analyzing medical textual data. The 2 NLP algorithms are: Word2vec and TF-IDF. The data are over 3,000 medical notes from visits to medical professionals.

For each one of the 2 NLP algorithms there are 2 tasks, in the fields of unsupervised learning and supervised learning:

Unsupervised learning: Cluster the texts to clusters using K-means.
Supervised learning: Classify each text to the correct diagnosis. In the Word2vec algorithm a RNN (Recurrent Neural Network) algorithm preforms the classification and in the TF-IDF algorithm a Random forest algorithm does.

A database in MongoDB stores the texts and details about the final K-means clusters for each algorithm.

The algorithms were deployed as a REST API with multiprocessing with Python and Flask. The API receives from a request an unknown medical text and returns one of the following:

Most common labels and closest sentences in the cluster assigned to the text according to the Word2vec & K-means algorithm or the TF-IDF & K-means algorithm.
Predicted diagnosis for the condition described in the text according to the Word2vec & RNN algorithm or the TF-IDF & Random forest algorithm.

Technologies used in the project: Python, Flask, MongoDB, PyTorch, Gensim, Scikit-learn.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
API.py		API.py
API_utils.py		API_utils.py
DB.py		DB.py
NLP Project Report.pdf		NLP Project Report.pdf
README.md		README.md
RNN.py		RNN.py
classes.py		classes.py
encounter.csv		encounter.csv
kmeans.py		kmeans.py
main.py		main.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

EladGashri/NLP_Algorithms_Clustering_and_Classification

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages