Skip to content

The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm.

License

Notifications You must be signed in to change notification settings

bysiber/text_similarity_tfidf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text_similarity_tfidf

The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm.

Project Objectives

The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm. This allows for the calculation of a similarity score between text documents and enables comparisons.

Used Algorithm

The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. This algorithm calculates the ratio between the frequency of each term in a document and the number of documents in the collection that contain that term. This provides a similarity score between the documents.

Requirements

  • Python 3
  • sklearn

Usage

To measure TF-IDF similarity, follow the steps below:

  1. Run the main.py file.
  2. Add the file names of the text documents to be compared to the text_files list.
  3. Run the program to display the similarity results on the screen.
  4. Ensure that you have the necessary dependencies installed before running the program. You can install the dependencies by running the following command:

Sample Outputs

Below are examples of the project's outputs: Similarity between test1.txt and test2.txt is -> 0.432891

About

The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages