Skip to content

Implementation of " Zhang Y, Lo D, Xia X et al. Multi-factor duplicate question detection in Stack Overflow. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 30(5): 981–997 Sept. 2015. DOI 10.1007/s11390-015-1576-4 " paper

Notifications You must be signed in to change notification settings

VIS-WA/Duplicate-Question-Detection-in-Stack-Overflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SMAI Course Project Monsoon 2021

Duplicate-Question-Detection-in-Stack-Overflow

A model for predicting top-k similar questions for the given question. This is an implementation of the following paper

Directories

|_ Dataset => Has all datasets and precalculated arrays used while execution.
	|_ Dataset.csv => Dataset used for the project
	|_ Training set Similarity scores.npy => CSV file with trained similarity scores
	|_ dataset_source.txt => sources for the datasets

|_ src => Executable codes
	|_ LDA_trial.ipynb => Sample LDA reference code
	|_ model + GUI.ipynb => A complete implementation of the model with GUI
	|_ Primary.ipynb => Implementation of the dupPredictor model on Programming dataset
	|_ PrimaryPhysics.ipynb => Implementation of the dupPredictor on Physics dataset
	|_ GUI.py => Python script of GUI implemented
|_ Report.pdf => Report to our course project
|_ bg.jpg => Reference background image for the GUI

Major Steps performed:

  • Data Extraction
  • Tokenisation and Porter Stemming (Preprocessing)
  • Vector Space Modelling (VSM)
  • Extract topics from description and title
  • LDA
  • Similarity Scores computation
  • Composer Score and Parameter estimation

Note: The code has been restructured for better view. The paths used in the code might be broken. Correct them before using the code

About

Implementation of " Zhang Y, Lo D, Xia X et al. Multi-factor duplicate question detection in Stack Overflow. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 30(5): 981–997 Sept. 2015. DOI 10.1007/s11390-015-1576-4 " paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published