Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphan datasets from Germany #10

Open
kbraak opened this issue Jul 7, 2017 · 7 comments
Open

Orphan datasets from Germany #10

kbraak opened this issue Jul 7, 2017 · 7 comments
Labels
Milestone

Comments

@kbraak
Copy link
Contributor

kbraak commented Jul 7, 2017

In Germany's list of orphans, all 15 datasets that are owned by Staatliche Naturwissenschaftliche Sammlungen Bayerns are false positives and should not be rescued. These are due to this bug in GBIF's crawling service.

Below is the latest analysis of the remaining orphan datasets conducted by @jholetschek. He is still awaiting replies form several hosts/curators/publishers to understand whether their data can come back online. Based on the results, GBIF will need to perform at least one dataset deletion plus change dataset endpoint URLs in GBIF Registry.

Germany orphaned.xlsx

@kbraak kbraak added the bug label Jul 7, 2017
@kbraak
Copy link
Contributor Author

kbraak commented Jul 21, 2017

Thanks @jholetschek for identifying that 8ea44a78-c6af-11e2-9b88-00145eb45e9a is back online.

@kbraak kbraak added this to the 2018 milestone Oct 27, 2017
@jholetschek
Copy link

Dataset https://www.gbif.org/dataset/85c8e444-f762-11e1-a439-00145eb45e9a is back online on a new BioCASe installation with 38.154 occurrences.

@kbraak
Copy link
Contributor Author

kbraak commented Nov 9, 2017

That's great news @jholetschek, thanks. That still leaves 43 candidate orphan datasets in Germnay that GBIFS hasn't been able to re-index in the last 6 months as you can see here https://github.com/gbif/watchdog/wiki/AdoptionPlan

@jholetschek
Copy link

This list still contains 5 datasets that are online.

@kbraak
Copy link
Contributor Author

kbraak commented Nov 13, 2017

Thanks @jholetschek

I updated the URLs for the 2 datasets from Friedrich-Alexander University of Erlangen-Nürnberg and triggered a re-crawl for them. I also triggered a re-crawl for the 3 datasets from Georg-August-Universität Göttingen. Hopefully they all finish crawling successfully this time.

@jholetschek
Copy link

Thanks a lot, Kyle! Seems all five datasets have been crawled successfully now.

Concerning https://www.gbif.org/dataset/ad0d1a24-e952-11e2-961f-00145eb45e9a: I'll meet with the curator next week and will try to convince him to bring the dataset back online.

@jholetschek
Copy link

Dataset https://www.gbif.org/dataset/ad0d1a24-e952-11e2-961f-00145eb45e9a is back online can can be crawled again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants