Fast keyword extraction from text using graph degeneracy-based approaches

Authors : Romain Avouac, Jaime Costa Centena

This is our final project for the ELTDM (software guidelines to process massive data) course at ENSAE. Our purpose was to find computationally efficient ways of performing keyword extraction from text using graph degeneracy criteria, as described in Tixier, Malliaros & Vazirgiannis (2016).

We focused on two major steps of the data processing pipeline : k-core decomposition to identify dense subgraphs (notebook), and computation of the elbow criteria to select relevant keywords (notebook). For each part, we provide extensive performance comparison for all the approaches we implemented (including cythonization, multithreading, multiprocessing). We provide an in-depth discussion (in French) of our results in a report.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
code		code
data/Hulth2003testing		data/Hulth2003testing
images		images
refs		refs
.gitignore		.gitignore
1_k_core_decomp.ipynb		1_k_core_decomp.ipynb
2_elbow_criteria.ipynb		2_elbow_criteria.ipynb
README.md		README.md
project_report.pdf		project_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast keyword extraction from text using graph degeneracy-based approaches

About

Releases

Packages

Contributors 3

Languages

JCCen/ELTDM

Folders and files

Latest commit

History

Repository files navigation

Fast keyword extraction from text using graph degeneracy-based approaches

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages