Skip to content

Frequently Asked Questions (FAQ)

Nikos Giarelis edited this page Aug 1, 2023 · 4 revisions

Language Support Questions

How many languages does LMRank support?

LMRank currently supports 14 languages in total, as listed in the table below:

Language Code
English ๐Ÿ‡ฌ๐Ÿ‡ง en
Greek ๐Ÿ‡ฌ๐Ÿ‡ท el
Danish ๐Ÿ‡ฉ๐Ÿ‡ฐ da
Catalan ca
Dutch ๐Ÿ‡ณ๐Ÿ‡ฑ nl
Finnish ๐Ÿ‡ซ๐Ÿ‡ฎ fi
French ๐Ÿ‡ซ๐Ÿ‡ท fr
German ๐Ÿ‡ฉ๐Ÿ‡ช de
Italian ๐Ÿ‡ฎ๐Ÿ‡น it
Japanese ๐Ÿ‡ฏ๐Ÿ‡ต ja
Norwegian ๐Ÿ‡ณ๐Ÿ‡ด
(Borkmal)
nb
Portuguese ๐Ÿ‡ต๐Ÿ‡น pt
Spanish ๐Ÿ‡ช๐Ÿ‡ธ es
Swedish ๐Ÿ‡ธ๐Ÿ‡ช sv

Will other languages be supported in the future?

LMRank uses the technique of dependency parsing to form candidate keyphrases, which utilizes spaCy's noun chunks.
When spaCy adds a small model (sm) with noun-chunk support for a language, support for it can be easily added.

Practical questions

Are there any examples that show me how to use LMRank?

You can see some examples at Google Colab or GitHub

Can I use a different transformer model from HuggingFace?

Yes, see the relevant section in the examples linked above.

Research Questions

Where can I find the datasets used in the experiments of the publication?

The datasets are available in this link

How can I extract the keyphrases for a specific dataset using LMRank?

Setup the base_path in main.py for the dataset directory and run main().

How can I benchmark the LMRank approach?

Setup the output_path in main.py for the lmrank_timings.csv and run benchmark().