Text-Segmentation-on-a-Probabilistic-Unigram-Model-using-Viterbi-Algorithm

-Aditya Batheja and Sidharth Thapar

This study aims to extract hidden words from continuous text (obtained by removing spaces from original text) by detecting word boundaries and performing text segmentation.

Viterbi Algorithm is used on a Unigram model of word probabilities to retrieve the original text by inserting spaces at correct locations with the aim of maximizing the cosine similarity of the resultant text and the original (formatted) text. The analysis of the results produced by comparison confirms the efficiency of this approach to solve the problem.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
.DS_Store		.DS_Store
CS5100-ProjectReport-Sidharth-Aditya.pdf		CS5100-ProjectReport-Sidharth-Aditya.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Segmentation-on-a-Probabilistic-Unigram-Model-using-Viterbi-Algorithm

-Aditya Batheja and Sidharth Thapar

About

Releases

Packages

Languages

sidharththapar/Text-Segmentation-on-a-Probabilistic-Unigram-Model-using-Viterbi-Algorithm

Folders and files

Latest commit

History

Repository files navigation

Text-Segmentation-on-a-Probabilistic-Unigram-Model-using-Viterbi-Algorithm

-Aditya Batheja and Sidharth Thapar

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages