Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authormap.txt to dictionaries/authority #138

Open
djtfmartin opened this issue Jun 27, 2024 · 6 comments
Open

Add authormap.txt to dictionaries/authority #138

djtfmartin opened this issue Jun 27, 2024 · 6 comments

Comments

@djtfmartin
Copy link
Contributor

djtfmartin commented Jun 27, 2024

The gbif/checklistbank and the ported version matching-ws rely on a number of dictionaries which are loaded at startup time

I suggest we add the author map file used by these services to rs.gbif.org for completeness. This avoids embedding a version of this file in docker images.

@mdoering
Copy link
Member

Great. I just modified the file format slightly to support unrestricted number of authorship variations and have the normed value in the first row.
CatalogueOfLife/backend@744fb28

Should we then use that new version for rs.gbif.org? We can add more entries while not breaking deployed code, but we cannot change the format like this commit did.

On the other hand I do wonder if we need any of these files to live externally and whether we should maybe bundle them all only as java resources. They do not change often, are required for stable test outcomes and you would probably still want a local copy in case rs.gbif.org is unreachable?

@djtfmartin
Copy link
Contributor Author

Perhaps we cankeep the external files and have fall back local copies for redundancy ?

I thought the external files would be useful to allow Living atlases to use matching-ws and / or docker images.

@mdoering
Copy link
Member

All of the other rs.gbif.org dictionary files are from the old code base and we don't use any of these in the new code. The more I think about it I would rather keep all dicts as resources in the new code. There are plenty of them already, e.g. lots of parser dicts. What was the reasoning for using the online for the docker images? So that content can change without the need to rebuild an image? The content hasn't changed in years and the code itself changes much quicker.

@djtfmartin
Copy link
Contributor Author

If ALA or NBN for example used the generated images for their own contexts, they might find it useful to tweak files like blacklisted.txt.

@mdoering
Copy link
Member

Ah yes. I forgot that your ported code still uses some of the old dicts.
What about bundling defaults and allow to override them in configs with a URL to some preferred file?

@djtfmartin
Copy link
Contributor Author

Yes, i think that makes sense. Ill do that.

djtfmartin added a commit to CatalogueOfLife/backend that referenced this issue Jun 27, 2024
djtfmartin added a commit to CatalogueOfLife/backend that referenced this issue Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants