Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we support less restrictive SWC format? #432

Open
adrien-berchet opened this issue Dec 12, 2022 · 12 comments
Open

Should we support less restrictive SWC format? #432

adrien-berchet opened this issue Dec 12, 2022 · 12 comments

Comments

@adrien-berchet
Copy link
Member

Some softwares (e.g. https://alleninstitute.org/what-we-do/brain-science/research/products-tools/vaa3d/) might create SWC files with less restrictive specs (e.g. a root ID that is not equal to -1, see Vaa3D/vaa3d_tools#38). I guess this is also true for other file formats (ASC and H5 files). This raises again the question: should MorphIO be only able to load 'realistic' morphologies or should it be able to load any file as is, even if the morphology does not make any sense? As pointed out by @hanchuan in Vaa3D/vaa3d_tools#38, it might make sense to store morphologies in an intermediate but not realistic state.
WDYT @lidakanari @eleftherioszisis @mgeplf ?

@adrien-berchet
Copy link
Member Author

Note that if we want to be able to read any morphology, we should be able to load somata with tree structures (i.e. with bifurcations, as described here: #213), which would be a major change.

@Helveg
Copy link
Contributor

Helveg commented Feb 23, 2023

Another restriction that I'd like to see removed is the ban on custom SWC tags :)

@adrien-berchet
Copy link
Member Author

Hi @Helveg
I am not super familiar with SWC file format, what do you mean by custom SWC tags?

@Helveg
Copy link
Contributor

Helveg commented Mar 2, 2023

I currently don't have a reproducer at hand! In the SWC spec it is called the "Structure Identifier", but the error raised by MorphIO goes something like "custom tags not supported", and it blocks you from entering any value above ... 10, 15? While the SWC spec just says that any 5+ value is considered custom, so you already support 10 custom tags, but no higher

@adrien-berchet
Copy link
Member Author

Ah yeah ok I see, it is indeed limited to 10. It seems this limitation comes from the NeuroMorpho specs (as stated here: https://github.com/BlueBrain/MorphIO/blob/master/include/morphio/enums.h#L82). I don't see any reason for this limitation but we want to keep compatibility with NeuroMorpho, so I guess we want to keep this limitation. Nevertheless, it would be possible to add an option to automatically set these tags to 10 (or any other value between 5 and 10, I don't know what's the best, though I think we should avoid 6 and 7 values since they can have special meaning (branch point and end point respectively)), so it would still be possible to load the morphology (with warnings issued of course), though the section types would be messed up. WDYT?

@Helveg
Copy link
Contributor

Helveg commented Mar 2, 2023

As long as I can still determine the original number that I gave in the SWC, that should be fine :)

@adrien-berchet
Copy link
Member Author

Unfortunately it would not be possible, all values >10 would be set to 10, so it would not be possible to know if the original value was 11, 12, 13, etc.

@mgeplf
Copy link
Contributor

mgeplf commented Jun 22, 2023

Wow, this fell off my radar; sorry about that.

e.g. a root ID that is not equal to -1,

It appears there has been a further attempt to standardise SWC better:
https://swc-specification.readthedocs.io/en/latest/

Which includes a governance/steering committee (with people from AIBS, INCF, neuromorpho, etc: https://swc-specification.readthedocs.io/en/latest/governance.html).

Currently, they are saying that -1 is the starting point (from https://swc-specification.readthedocs.io/en/latest/swc.html):

Parent defines how points are connected to each other. In a tree, multiple points can have the same ParentID. The first point in the file must have a ParentID equal to -1, which represents the root point.

So I'd prefer to keep this, as I think it makes sense to have at least some amount of uniformity and standard conformance.

WRT custom tags, I don't really have a problem bumping the number to 20 or something, but that feels like a stopgap. The above specification only has 0-7, w/ 5 as custom, and 6 as unspecified neurite.

mgeplf added a commit that referenced this issue Jun 22, 2023
* #432
* #456 (type 18 seen in the wild)
mgeplf added a commit that referenced this issue Jun 22, 2023
* #432
* #456 (type 18 seen in the wild)
mgeplf added a commit that referenced this issue Jun 23, 2023
* #432
* #456 (type 18 seen in the wild)
@mgeplf
Copy link
Contributor

mgeplf commented Jun 23, 2023

I've added more custom types; up until 19; hopefully that's enough for a while.

@Helveg
Copy link
Contributor

Helveg commented Jun 23, 2023

Thanks for the addition! That should probably cover our needs for a while, but is more restrictive than I believe the format specifies, which depends on how you read the document. The table says for column 2:

 Type identifier. A positive integer.

So that's anything [0, +inf[, it then continues to talk about the "The basic set of types used in NeuroMorpho.org":

The basic set of types used in NeuroMorpho.org SWC files are: TypeID | Description — | — 0 | undefined 1 | soma 2 | axon 3 | (basal) dendrite 4 | apical dendrite 5 | custom 6 | unspecified neurite 7 | glia processes

I'm not sure if they mean to say that that should be the specification, or if they literally meant it as just an example of what NeuroMorpho uses.

The document feels kind of rushed and unfinished, for example, the grammar of Nanda et al 2018 says:

integer = [+|-] digit{digit} ;
Type = integer ;

which means type could be negative just as well: ]-inf, +inf[

(PS: my tool encodes all regions with unique parameter sets on the morphology as a separate tag, I now use a homemade SWC parser, but I'd prefer to drop it, but this issue and access to the header would be required for that)

@mgeplf
Copy link
Contributor

mgeplf commented Jul 3, 2023

Thanks for the addition! That should probably cover our needs for a while, but is more restrictive than I believe the format specifies, which depends on how you read the document.

It is more restrictive, but I also think that supporting many more would isn't in the spirit of having a "type" identifier, when the other examples are soma, axon, dendrite, and basal-dendrite. For morphologies, the universe of neurites won't probably suddenly expand, and people are taking liberties with the format to encode other information, which is fragmenting the "standard" - I'd prefer not to participate in that fragmentation if possible.

The document feels kind of rushed and unfinished

Perhaps, but I think it's more clear than other "standardizations" I've read thus far.

access to the header would be required for that

I will give that some thought; since swc+ is also adding metadata to the header.

@mgeplf
Copy link
Contributor

mgeplf commented Aug 22, 2023

I have commented on the INCF proposal here:
https://www.incf.org/commentaries/swc

Which should hopefully have integrated the feedback from you, @Helveg, about integer types - I suggested that they have a defined type.

Hopefully the other points in that proposal come true; that non-negative ids are not allowed, everything starts at root nodes, the ids are dense.

If it's ok, I'm going to close this ticket for now, but open one about the access to the headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants