Skip to content

This study aims to extract hidden words from continuous text (obtained by removing spaces from original text) by detecting word boundaries and performing text segmentation using Viterbi Algorithm.

Notifications You must be signed in to change notification settings

sidharththapar/Text-Segmentation-on-a-Probabilistic-Unigram-Model-using-Viterbi-Algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text-Segmentation-on-a-Probabilistic-Unigram-Model-using-Viterbi-Algorithm

-Aditya Batheja and Sidharth Thapar

This study aims to extract hidden words from continuous text (obtained by removing spaces from original text) by detecting word boundaries and performing text segmentation.

Viterbi Algorithm is used on a Unigram model of word probabilities to retrieve the original text by inserting spaces at correct locations with the aim of maximizing the cosine similarity of the resultant text and the original (formatted) text. The analysis of the results produced by comparison confirms the efficiency of this approach to solve the problem.

About

This study aims to extract hidden words from continuous text (obtained by removing spaces from original text) by detecting word boundaries and performing text segmentation using Viterbi Algorithm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages