Skip to content

benldr/JPruningRadixTrie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

JPruningRadixTrie
MIT License

Java port of wolfgarbe/PruningRadixTrie. See link for the background of the project etc. All credit to Wolf Garbe.

No features have been added/removed other than where necessary to match Java conventions. So functionality and performance should be unchanged, aside only from these exceptions:

  • Removed parameter out long termFrequencyCountPrefix from method getTopkTermsForPrefix because awkward to achieve this in Java without changing the method's return type significantly. Specifically, the functionality lost is- getting the number of terms in the dictionary which begin with the given prefix (not an essential feature).
  • readTermsFromFile now requires that the frequency values in the file have a max value of 2^63 - 1, whereas before they could have a max value of 2^64 - 1 (due to the frequency values are interpreted as signed longs rather than unsigned longs).
  • Introduced parameter String delimiter to method readTermsFromFile so that text files with delimiters other than \t can be read. See Usage below.

Download Instructions:

Copy the source code into your project! There are only 4 classes to copy, and no external dependencies. I'm afraid I do not have plans to upload this project to Maven Central or the like.


Usage:

Create Object

PruningRadixTrie pruningRadixTrie = new PruningRadixTrie();

addTerm: insert term and term frequency count into Pruning Radix Trie. Frequency counts for same term are summed up.

pruningRadixTrie.addTerm("microsoft", 1000);

getTopkTermsForPrefix: retrieve the top-k most relevant terms for a given prefix from the Pruning Radix Trie.

String prefix = "micro";
int topK = 10;
List<TermAndFrequency> results = pruningRadixTrie.getTopkTermsForPrefix(prefix, topK);
for (TermAndFrequency result : results) {
  System.out.println(result.getTerm() + " " + result.getTermFrequencyCount());
}

readTermsFromFile: Deserialise the Pruning Radix Trie from disk for persistence. Note here that terms.txt contains tab-delimited lines, ie in the format term\tfrequency. If format is term,frequency then replace second argument with ",", etc.

pruningRadixTrie.readTermsFromFile("terms.txt", "\t");

writeTermsToFile: Serialise the Pruning Radix Trie to disk for persistence.

pruningRadixTrie.writeTermsToFile("terms.txt");

About

Java port of wolfgarbe/PruningRadixTrie

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages