Skip to content

ByteHamster/LeMonHash

Repository files navigation

Learned Monotone Minimal Perfect Hashing

Logo

License: GPL v3 Build status

A monotone minimal perfect hash function (MMPHF) maps a set S of n input keys to the first n integers without collisions. At the same time, it respects the natural order of the input universe. In other words, it maps each input key to its rank. MMPHFs have many applications in databases and space-efficient data structures.

LeMonHash (Learned Monotone Minimal Perfect Hashing) is a novel MMPHF that learns about regularities in the input data to achieve significant space and performance improvements. It uses the PGM-Index to calculate a learned rank estimate for each key and then solves collisions between these estimates using the retrieval data structure BuRR. Compared to competitors that are mostly based on tree-like data structures, LeMonHash is a lot more flat and therefore faster to query. LeMonHash dominates most competitors in terms of construction throughput, query throughput, and space consumption -- simultaneously. We also give a variant for variable-length strings that achieves significantly faster queries than competitors.

Usage

Requirements:

Clone the repository (as a submodule) and add the following to your CMakeLists.txt.

add_subdirectory(path/to/LeMonHash)
target_link_libraries(YourTarget PRIVATE LeMonHash)

Then you can use the straight-forward interface of LeMonHash:

std::vector<uint64_t> inputData {0, 1, 7, 15, 23, 42, 250};
lemonhash::LeMonHash<> hashFunc(inputData);
for (uint64_t x : inputData) {
    std::cout << x << ": \t" << hashFunc(x) << std::endl;
}

Query performance

Plots preview

License

This code is licensed under the GPLv3. If you use the project in an academic context or publication, please cite our paper:

@inproceedings{DBLP:conf/esa/FerraginaL0V23,
  author       = {Paolo Ferragina and
                  Hans{-}Peter Lehmann and
                  Peter Sanders and
                  Giorgio Vinciguerra},
  title        = {Learned Monotone Minimal Perfect Hashing},
  booktitle    = {{ESA}},
  series       = {LIPIcs},
  volume       = {274},
  pages        = {46:1--46:17},
  publisher    = {Schloss Dagstuhl - Leibniz-Zentrum f{\"{u}}r Informatik},
  year         = {2023},
  doi          = {10.4230/LIPICS.ESA.2023.46}
}

The code of the experiments comparing LeMonHash to competitors from the literature is available here.