Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collectionCode and collectionID to eventCore #96

Open
timrobertson100 opened this issue Feb 20, 2023 · 5 comments
Open

Add collectionCode and collectionID to eventCore #96

timrobertson100 opened this issue Feb 20, 2023 · 5 comments

Comments

@timrobertson100
Copy link
Member

Originally reported here by @peterdesmet, providing the justification:

to indicate the (virtual) collection an event based dataset is derived from

The seems reasonable and fits within the intention from the DwC collectionCode description:

The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.

@peterdesmet
Copy link
Member

Thanks Tim. For completeness, I'd immediately add collectionID as well:

An identifier for the collection or dataset from which the record was derived.

@timrobertson100 timrobertson100 changed the title Add collectionCode to eventCore Add collectionCode and collectionID to eventCore Feb 20, 2023
@dagendresen
Copy link

Would not dwc:datasetID and dwc:datasetName be more appropriate? Or even better proposing new record-level terms for project and projectID in Darwin Core? Than using collectionCode and collectionID for this purpose?

@peterdesmet
Copy link
Member

@dagendresen, in my experience datasetName and datasetID are mainly used for the published dataset itself (title + doi). project and projectID could be useful, but to indicate the project (cf. project in metadata) not the originating source database.

Indicating the originating source database/system doesn't violate the definition for collectionCode in my opinion:

The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.

It is already used as such for Occurrence datasets (not only by me, see e.g. EBIRD in https://www.gbif.org/occurrence/3504179613). I don't think it makes sense to exclude it if your dataset happens to be organized as an Event Core dataset, which is why I'm proposing here to include it.

@dagendresen
Copy link

dagendresen commented Feb 22, 2023

Would not rather the datasetName and datasetID for the published dataset itself instead belong in the dataset-level EML metadata than as a property for each record?

datasetName and datasetID are mainly used for the published dataset itself

Records inside the same dataset could be from different named specimen collections.

GBIF Norway uses datasetName as a (poor) proxy for "projectName" - records inside the same dataset (DwC-A dataset) often originate from different projects (e.g., collecting expeditions) or are improved In different GBIF-node-funded digitization, georeferencing, or other data quality projects. And we here require grant recipients to use datasetName or datasetID to credit the node grant project.

@peterdesmet
Copy link
Member

Yeah, the uses of those fields differ. Note that the IPT does suggest the resource DOI for the datasetID (i.e. resource = datasetName/datasetID):

Screenshot 2023-02-22 at 10 59 04

Irrespective on how the fields are actually used, I don't think collectionCode and collectionID should be excluded if you structure your dataset like an event-core dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants