Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unclear Recommendation for Experimenter Name Formatting #253

Closed
3 tasks done
TomDonoghue opened this issue Aug 17, 2022 · 12 comments · Fixed by #254
Closed
3 tasks done

[Bug]: Unclear Recommendation for Experimenter Name Formatting #253

TomDonoghue opened this issue Aug 17, 2022 · 12 comments · Fixed by #254
Labels
topic: docs issues related to documentation

Comments

@TomDonoghue
Copy link
Contributor

TomDonoghue commented Aug 17, 2022

What happened?

If there is a formatting issue with the experimenter names, the following warning is raised:
"Message: The name of experimenter 'Donoghue T' does not match the DANDI form (Last, First Middle or Last, First M.)."

This comes from here:

"(Last, First Middle or Last, First M.)."

I find the description of the format that is supposed to be followed quite confusing.

From what I can tell there was some discussion of the format here, which I think concluded that "LastName, FirstName MiddleInitial" is what DANDI expects. Following this, if I update names to follow this (ex: "Donoghue, Thomas") nwbinspector no longer throws the warning.

I'm not sure if I'm missing something broader here, but in terms of interpretability of the warning message, could it read:

f"The name of experimenter '{experimenter}' does not match the DANDI form "
"(LastName, FirstName MiddleInitial)."

^This seems to work, and matches the notes in the prior issue - I'm just not sure if it's maybe too restrictive (and perhaps doesn't cover other accepted inputs). Either way, I find that the current message isn't really interpretable.

SideNote: I didn't spend any time parsing the regex or checking the details of the DANDI format, so maybe the general accepted pattern is more obvious from there.

Steps to Reproduce

The warning gets thrown with a wrong author format, for example "LastName Initial".

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.8

Usage

Command Line Interface

Were you streaming with ROS3?

No

Package Versions

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • Have you ensured this bug was not already reported?
  • To the best of your ability, have you ensured this is a bug within the code that checks the NWBFile, rather than a bug in the NWBFile reader (e.g., PyNWB or MatNWB)?
@TomDonoghue TomDonoghue added the category: bug errors in the code or code behavior label Aug 17, 2022
@CodyCBakerPhD CodyCBakerPhD added topic: docs issues related to documentation and removed category: bug errors in the code or code behavior labels Aug 17, 2022
@CodyCBakerPhD
Copy link
Collaborator

CodyCBakerPhD commented Aug 17, 2022

Thanks for the feedback - this name convention topic has been rather extensive, and the response from that DANDI team is that "names are complicated", which is absolutely true in this day and age. The regex is indeed a bit restrictive IMO however, it is largely up to DANDI for what they want to accept.

As for improving the communicated message, let's discuss this and try to reach an improvement.

For your suggestions of utilizing parentheses,

Message = f"The name of experimenter '{experimenter}' does not match the DANDI form (LastName, FirstName MiddleInitial)."

I just worry that that makes people think they need to wrap their string in parentheses as well, such as

nwbfile = pynwb.NWBFile(
    ...
    experimenters=["(Baker, Cody C.)"]
)

or even worse try something weird with tuples (specific to people using PyNWB, admitedly)

nwbfile = pynwb.NWBFile(
    ...
    experimenters=("Baker", "Cody C.")
)

As for the message as it is right now, it is to indicate that the 'middle name' part of it can be either an initial, or a full middle name. E.g., both of these are accepted

nwbfile = pynwb.NWBFile(
    ...
    experimenters=["Baker, Cody C"]
)

and

nwbfile = pynwb.NWBFile(
    ...
    experimenters=["Baker, Cody Carrington"]
)

How can I better communicate those two accepted cases in a non-confusing way?

EDIT:

Oh, also a third case of optional accepted period applied to the initial

nwbfile = pynwb.NWBFile(
    ...
    experimenters=["Baker, Cody C."]
)

@bendichter
Copy link
Contributor

bendichter commented Aug 17, 2022

@CodyCBakerPhD

oh, you are trying to communicate:

Last, First Middle
or
Last, First M.

Yes, I also find this confusing.

How about:

"The name of experimenter '{experimenter}' does not match the DANDI form: 'LastName, FirstName MiddleInitial.' or 'LastName, FirstName MiddleName'."

@CodyCBakerPhD
Copy link
Collaborator

"The name of experimenter '{experimenter}' does not match the DANDI form: 'LastName, FirstName MiddleInitial.' or 'LastName, FirstName, MiddleName'."

That reads fine to me, what do you think @TomDonoghue

@TomDonoghue
Copy link
Contributor Author

TomDonoghue commented Aug 17, 2022

OH! My main misinterpretation here is that I wasn't parsing the original message into two separate layouts, reading it all as one designation rather than two (basically, reading "First Middle or Last" as a description of the 2nd element of the overall name, rather than the or indicating two valid name layouts). Something about having the whole thing in a () threw me off I think. Now that's it's clicked I see what's going on. That's partly my bad on getting confused hahaha.

I like @bendichter's suggestion. If I'm now understanding, then writing it as
"LastName, FirstName MiddleInitial" or "LastName, FirstName MiddleName"
more clearly (to me) parses out the two slightly different acceptable conventions.

To Cody's list above: I believe this covers the acceptable cases, other than it doesn't explicitly describe whether the initial should have a period, but since Donoghue, Thomas R. or Donoghue, Thomas R are both accepted, not specifying this separately seems fine I think?

@bendichter
Copy link
Contributor

As long as we are being exhaustive, maybe we could do:

"The name of experimenter '{experimenter}' does not match the DANDI form: 'LastName, Firstname', 'LastName, FirstName MiddleInitial.' or 'LastName, FirstName, MiddleName'."

@TomDonoghue
Copy link
Contributor Author

I think listing all is fine - though I think setting it up as the DANDI form is partly what made me think it was all one description, so maybe:

"The name of experimenter '{experimenter}' does not match any of the accepted DANDI forms, which are: 'LastName, Firstname', 'LastName, FirstName MiddleInitial.' or 'LastName, FirstName, MiddleName'."

(or similar, with updated phrasing that clearly notes there is more than one acceptable layout here)

@oruebel
Copy link
Contributor

oruebel commented Aug 17, 2022

Just a thought, but one way we could ease data entry is by providing a format string, e.g., pynwb.file.experimenter_str = '{last}, {first} {middle}' such that a user could enter the experimenter as:

from pynwb.file import experimenter_str
nwbfile = pynwb.NWBFile(
    ...
    experimenters=[experimenter_str.format(last="Baker", first="Cody", middle="C.")]
)

@CodyCBakerPhD
Copy link
Collaborator

CodyCBakerPhD commented Aug 17, 2022

@oruebel Or have an Experimenter data type (simple dataclass, perhaps), each with last, first, middle attributes so I can add a bunch like

from pynwb.file import Experimenter

experimenter_1 = Experimenter(last="Baker", first="Cody", middle="C.")
experimenter_2 = Experimenter(last="Ruebel", first="Oliver")

nwbfile = pynwb.NWBFile(
    ...
    experimenters=[experimenter_1, experimenter_2]
)

EDIT: This would also allow PyNWB-level of format checking and automated coercion to this expected form as the final output string (i.e., define __to_str__ (or whatever that magic function is called) as return f"'{last}, {first} {middle}'"

@oruebel
Copy link
Contributor

oruebel commented Aug 17, 2022

Or have an Experimenter data type (simple dataclass, perhaps), each with last, first, middle attributes so I can add a bunch like

I agree, that would be an even nicer option as it provides richer functionality, is easier to use, and is also more consistent with the PyNWB API. @CodyCBakerPhD I would suggest creating an issue on PyNWB with your suggestion.

define __to_str__ (or whatever that magic function is called)

The magic function for this is __str__, which is called when you call print() or str() on the object. In addition we may also want to overwrite __repr__ , which is called by repr()

@bendichter
Copy link
Contributor

Creating an Experimenter class is fine, but we will still need the free string approach for backwards compatibility, which means we will still need to parse free strings.

@oruebel
Copy link
Contributor

oruebel commented Aug 17, 2022

which means we will still need to parse free strings.

Yes, either way the names would still be stored as strings in the schema, so we need to parse them. However, you could implement the string parsing in classmethods, e.g.,Experimenter.parse to parse a string, Experimenter.from_str to create an Experimenter object from a string. In this way, this would make it easier to create properly formatted experimenter strings and make it easy for users to access the components of names.

Given that the format in HDF5 is a flat string, the Experimenter class would likely need a Experimenter.raw attribute to store the original string and provide proper access to the experimenter name in case it cannot be parsed into the components last, first, middle.

@oruebel
Copy link
Contributor

oruebel commented Aug 17, 2022

Something like https://pypi.org/project/nameparser/ may also already provide many of the functions we are looking for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: docs issues related to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants