Skip to content

leilibrk/InformationRetrieval

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InformationRetrieval

Instructor: Dr. A. Nikabadi

Course content: CS276 Standford University

Semester: Fall 2022

This project is for Information Retrieval course which aims to implement a search engine for both phrase queries and Free text queries on Fars News Dataset.

First Phase

  1. Preprocessing on data (Noramlization, Tokenization, Stemming, Removing Stopwords)

  2. Working with both most used NLP persian toolkits : hazm, parsivar

  3. Created a positional inverted index

  4. Used Zipf's law

zip2.png

  1. Used Heaps law

zip2.png

  1. Searching by Normal quries, Phrase Queries (used permuterm index), Boolean queries

  2. Ranking results

Second phase

  1. Show words in vector representation

  2. Compute tf-idf

  3. Compute cosine similarity between query terms and documents

  4. Used Index elimination techniques such as creating champion list

  5. Rank results based on most relevent results

phase2.png


Contributors : Rojina kashefi & Leili Barekatein

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%