Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add max_channel column for phy interface #961

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

bendichter
Copy link
Contributor

Some older Phy-based NWB files, e.g. https://neurosift.app/?p=/nwb&url=https://api.dandiarchive.org/api/assets/b4a15b14-3a7b-49ae-b9ba-fa2c4d8f6c29/download/&dandisetId=000053&dandisetVersion=0.210819.0345 do not have any information that could associate a each unit to an electrode (or physical location). Without this information it is not possible to map a unit to 3D coordinates nor brain area, which has limited the utility of the data for secondary analysis. This is an attempt to remedy this by getting the templates from the Phy output files, calculating the peak channel of each template, and adding that to as a property of the units table.

@CodyCBakerPhD CodyCBakerPhD requested review from h-mayorquin and removed request for CodyCBakerPhD July 14, 2024 22:03
Copy link

codecov bot commented Aug 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.27%. Comparing base (b9507bf) to head (a82d98b).
Report is 10 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #961      +/-   ##
==========================================
- Coverage   91.43%   91.27%   -0.17%     
==========================================
  Files         127      127              
  Lines        7540     7585      +45     
==========================================
+ Hits         6894     6923      +29     
- Misses        646      662      +16     
Flag Coverage Δ
unittests 91.27% <100.00%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...onv/datainterfaces/ecephys/phy/phydatainterface.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the nwbfile has electrodes, the max_channel is stored as a reference to the electrodes table

if property in ["max_channel", "max_electrode"] and nwbfile.electrodes is not None:
data_to_add[property].update(table=nwbfile.electrodes)

But this does not happen for the test that you added because we are only writing a unit table in the nwbfile there. So the max_channel is just a normal int column.

For both cases I feel that the description added to the units table is sort of wrong:

max_channel="The recording channel id with the largest amplitude.",
halfwidth="The full-width half maximum of the negative peak computed on the maximum channel.",

This is pointing out to an electrode table index, the channel ids of the original extractor are most likely lost in the first case and meaningless when only Phy is written as in your test. What do you think?

In general I am wondering about the case when you also add a recorder. I am not sure the mapping is done right. Something like in the docstring here:

https://neuroconv--961.org.readthedocs.build/en/961/conversion_examples_gallery/combinations/spikeglx_and_phy.html

I am not sure the references to the electrodes will be mapped to the right electrode. What do you think?

from pynwb import NWBHDF5IO
from spikeinterface.extractors.nwbextractors import read_nwbfile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuse me for a bit. I think it would be better t ouse read_nwb_sorting instead of read_nwbfile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using read_nwbfile because I want to read the data in to NWBFile object and confirm that the max_channel column was properly written to the units table. This seemed like a convenient way to read the data but I agree it's a bit strange to rely on SpikeInterface for this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Ben, my confusion was worse than that. Sadly, we have that function name duplicated in that space and I thought you were calling this function (which is at the bottom of the same file):

https://github.com/h-mayorquin/spikeinterface/blob/2085cbe9a7fb74ce5d9c2c831611eb2a596de686/src/spikeinterface/extractors/nwbextractors.py#L1367-L1368

@@ -23,6 +27,23 @@ def get_source_schema(cls) -> dict:
] = "Path to the output Phy folder (containing the params.py)."
return source_schema

def get_max_channel(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring? It would be useful here to describe some of the Phy charateristics. Like, I read from the code below that the templates are stored unwithened, the axis operation would be clearer if you wrote the template shape somewhere (I think its shape is (num_templates, num_samples, num_channels), right? Plus, the definition of cluster id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this could use a docstring

write_as: Literal["units", "processing"] = "units",
units_name: str = "units",
units_description: str = "Imported from Phy",
include_max_channel: bool = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice that this inherits the doc from the parent:

image

But the new include_max_channel is not documented.

@bendichter
Copy link
Contributor Author

@h-mayorquin I agree that these indices don't have much meaning if the electrodes table is missing. I think sorting interfaces should write the electrodes table.

idx_max_channel = np.argmax(max_over_time, axis=1)
max_channel = channel_map[idx_max_channel].ravel()

return max_channel
Copy link
Collaborator

@h-mayorquin h-mayorquin Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, two other questions here:

  1. Aren't the templates scaled? I am thinking on the amplitudes.npy . This would mean the argmax operation might not be right.
  2. Also, is max_channel meanigful if spikes are negative? I think here it is consistent but aren't most people interested on the absolute value largest. On the templates data base we use "best channel" which did not have that load.

Now I am aware that you are using machinery that is already there in the add_units_table but I wanted to point this out.

https://phy.readthedocs.io/en/latest/sorting_user_guide/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Scaled how? In a way that would affect which channel is selected?
  2. yeah, good point, the argmax(abs()) might be better

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I shared this with a friend that uses Phy and he flagged that. Reading the documentation I feel less certain.

We could do a roun-trip with spikeinterface artificial data and the export to phy functionality to see if your function gets it right.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some code by Nick Stteinmetz that calculates templates max without using the amplitudes (but they don't do the de-whitening step that you do)

https://github.com/cortex-lab/spikes/blob/master/analysis/templatePositionsAmplitudes.m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he does. Look at winv

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the block where the max channel is calculated take the temps as they come directly from the input:

https://github.com/cortex-lab/spikes/blob/fcea2b20e736b533e5baf612752b66121a691128/analysis/templatePositionsAmplitudes.m#L64-L71

Am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants