[Add Check]: Check that if experimenters string does not have comma #33

rly · 2022-02-15T18:20:08Z

If a user provides a comma-separated scalar string for nwbfile.experimeter, the inspector should recommend that they instead store an array of experimenter names. Same for keywords and related publications.

The text was updated successfully, but these errors were encountered:

bendichter · 2022-02-26T22:43:02Z

This is a bit tricky for experimenters, because they might have listed the name Last, First

CodyCBakerPhD · 2022-04-07T16:06:57Z

The BP section for experimenters does not describe the format in which they ought to write these things down: should this be updated to some standard, such as

Last Name, M. I., First Name

M. I. - 'middle initial'

And how should multiple experimenters be handled?

bendichter · 2022-04-07T18:09:26Z

multiple experimenters should be handled as a vector of strings

CodyCBakerPhD · 2022-04-11T14:14:15Z

OK so, as I understand the heart of this original issue is to try to detect people encoding multiple experimenters with comma-separation instead of being added as different elements of the vector.

However, there are several different ways people can encode a single experimenter name, such as

(a) Firstname Lastname
(b) Lastname, Firstname
(c) [Optional]: Firstname, M.I., Lastname
(di) [Optional]: Lastname, Firstname, M.I.
(dii) [Optional]: Lastname, Firstname M.I.
(e) [Optional]: Lastname, M.I., Firstname

Now, we can still try to catch the heart of this issue by triggering the violation if any((experimenter.count(",") for experimenter in nwbfile.experimenters)) > 3

However, since the Best Practices make many references to trying to make NWBFiles machine-readable, I wonder if it might be in our best interest to encourage a single standard for encoding names (from a-e above), which would also make this kind of check easier and more reliable to catch.

Note: the experimenter schema also mentions that roles can be specified. I'd propose that a Best Practice for that could be optionally including NameOfRole: before the name of the individual to make this easier to parse.

bendichter · 2022-04-11T14:41:44Z

Here's DANDI's regex for name: https://github.com/dandi/dandi-schema/blob/ffa49ef5dc84bc5a234fbc9ca9ba04b872539893/dandischema/models.py#L48

CodyCBakerPhD · 2022-04-11T15:24:14Z

Here's DANDI's regex for name: https://github.com/dandi/dandi-schema/blob/ffa49ef5dc84bc5a234fbc9ca9ba04b872539893/dandischema/models.py#L48

Then their regex matches option (dii).

That could regex could be improved however; note that it allows numbers included in names (try re.fullmatch(pattern=r"^([\w\s\-]+),\s+([\w\s\-\.]+)$", string="Baker2, Cody1")) and disallows apostrophes (likere.fullmatch(pattern=r"^([\w\s\-]+),\s+([\w\s\-\.]+)$", string="O'Baker, Cody")).

Also note it does not accommodate role assignment of each experimenter, which the NWB Schema indicates is allowed

@bendichter So do you think we should make it a Best Practice to enforce structure (dii)? How should we deal with roles?

bendichter · 2022-04-11T18:46:53Z

@CodyCBakerPhD good points. Maybe we should raise these issues on the dandi schema repo

CodyCBakerPhD · 2022-04-13T19:58:57Z

Turns out that was an older version (not most recent master): https://github.com/dandi/dandi-schema/blob/master/dandischema/models.py#L60

And they may continue to think numbers in names are OK.

One thing to note though as I look through their code and how they use it, I'm not 100% sure they actually require the NWBAsset experimenter metadata to follow that, that's just what they enforce for their dandi assets after extraction. So whatever format the NWB team decides to support for this as a 'best practice' can always be mapped into what dandi expects via their intermediate metadata handling functions.

bendichter · 2022-04-13T20:55:44Z

@CodyCBakerPhD yes, exactly. This is a dandiset metadata requirement. I'd like to have the option of automatically pulling the experimenter metadata into the dandiset metadata, which is why I thought it might be convenient to have the same regex as a best practice.

CodyCBakerPhD added this to the Prepare for v0.2.0 Release milestone Feb 20, 2022

CodyCBakerPhD added the category: new check a new best practices check to apply to all NWBFiles and their contents label Feb 24, 2022

CodyCBakerPhD changed the title ~~Check that if experimenters string does not have comma~~ [Add Check]: Check that if experimenters string does not have comma Mar 15, 2022

CodyCBakerPhD self-assigned this Apr 11, 2022

CodyCBakerPhD mentioned this issue Jul 11, 2022

[New Check]: Add experimenter form check using DANDI regex #227

Merged

CodyCBakerPhD closed this as completed in #227 Jul 12, 2022

TomDonoghue mentioned this issue Aug 17, 2022

[Bug]: Unclear Recommendation for Experimenter Name Formatting #253

Closed

3 tasks

rly mentioned this issue Sep 7, 2022

Specify format for experimenter name NeurodataWithoutBorders/nwb-schema#528

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Add Check]: Check that if experimenters string does not have comma #33

[Add Check]: Check that if experimenters string does not have comma #33

rly commented Feb 15, 2022

bendichter commented Feb 26, 2022

CodyCBakerPhD commented Apr 7, 2022

bendichter commented Apr 7, 2022

CodyCBakerPhD commented Apr 11, 2022 •

edited

Loading

bendichter commented Apr 11, 2022

CodyCBakerPhD commented Apr 11, 2022

bendichter commented Apr 11, 2022

CodyCBakerPhD commented Apr 13, 2022

bendichter commented Apr 13, 2022

[Add Check]: Check that if experimenters string does not have comma #33

[Add Check]: Check that if experimenters string does not have comma #33

Comments

rly commented Feb 15, 2022

bendichter commented Feb 26, 2022

CodyCBakerPhD commented Apr 7, 2022

bendichter commented Apr 7, 2022

CodyCBakerPhD commented Apr 11, 2022 • edited Loading

bendichter commented Apr 11, 2022

CodyCBakerPhD commented Apr 11, 2022

bendichter commented Apr 11, 2022

CodyCBakerPhD commented Apr 13, 2022

bendichter commented Apr 13, 2022

CodyCBakerPhD commented Apr 11, 2022 •

edited

Loading