Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

french mistankenly detected as portugese #63

Open
serpico opened this issue Apr 3, 2024 · 1 comment
Open

french mistankenly detected as portugese #63

serpico opened this issue Apr 3, 2024 · 1 comment

Comments

@serpico
Copy link

serpico commented Apr 3, 2024

I followed the Installation and Setup section to install Pears-orchard, also installed 'fr' in addition to 'simple'.
Went to Indexer then to Index a single URL

The page is not saved and in the terminal I get

Language for https://forum-auto.caradisiac.com/topic/21001-mon-ex-megane-2-19-dci-130-confort-expression/#comments : pt

after installing pt language the page is indeed saved

@minimalparts
Copy link
Member

Thanks for trying out PeARS!

I am not too sure why that page comes up as Portuguese. We rely on an external library for language detection (langdetect) and it is obviously misbehaving there. I tried that page on the Federated version of PeARS, which has better language support, and for me that page comes up as Italian :) It's a mystery, because other pages from the same site do come up as French as they should...

NB: the Orchard version of PeARS is very brittle. It is the one with the most compact representations, but also the most unreliable ones. We are actively working on Federated right now and have a different indexing system, which can potentially be set up in a way that things don't crash entirely when the page language is unreliably recognised. So thanks for reporting this, it shows that we really have to get this sorted out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants