Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nltk.tokenize.moses no longer available #1

Closed
petri opened this issue Dec 10, 2018 · 1 comment
Closed

nltk.tokenize.moses no longer available #1

petri opened this issue Dec 10, 2018 · 1 comment

Comments

@petri
Copy link

petri commented Dec 10, 2018

Trying to use finnlem results in error:

Traceback (most recent call last):
  File "test.py", line 1, in <module>
    from model_wrappers import Seq2Seq
  File "/Users/petri/Code/finnlem/src/model_wrappers.py", line 13, in <module>
    from nltk.tokenize.moses import MosesDetokenizer
ModuleNotFoundError: No module named 'nltk.tokenize.moses'

This is because MosesTokenizer has been moved out of NLTK due to licensing issues. See for example
pytorch/text#306 .

Apparently sacremoses package is now recommended as replacement.

@jmyrberg
Copy link
Owner

jmyrberg commented Feb 26, 2019

Hi Petri,

Sorry for the late reply, and thanks for reporting this. Since this was just a hobby project, I was expecting that something would break up at some point :)

But luckily, I was happy to see that the Turku NLP Group had just recently released a paper, and the related neural parser, which (afaik) uses a similar approach for lemmatization. So, unless you just want to learn how the seq2seq model works here, I would recommend using their parser for actual lemmatization tasks.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants