Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop casting ints to floats #964

Closed
wants to merge 6 commits into from
Closed

Conversation

bendichter
Copy link
Contributor

fix #960

@CodyCBakerPhD CodyCBakerPhD removed their request for review July 16, 2024 20:54
@h-mayorquin
Copy link
Collaborator

This breaks backwards compatbility at the level of add_sorting and add_units_table . The current works in main but does not work in your branch:

from spikeinterface.core import generate_sorting
from pynwb.testing.mock.file import mock_NWBFile


nwbfile = mock_NWBFile()

sorting1 = generate_sorting(num_units=4)
sorting1 = sorting1.rename_units(new_unit_ids=["a", "b", "c", "d"])

sorting2 = generate_sorting(num_units=4)
sorting2 = sorting2.rename_units(new_unit_ids=["c", "d", "e", "f"])
sorting2.set_property(key="an_integer_property", values=[1, 2, 3, 4])

from neuroconv.tools.spikeinterface import add_sorting 

from pynwb.testing.mock.file import mock_NWBFile

add_sorting(sorting=sorting1, nwbfile=nwbfile)
add_sorting(sorting=sorting2, nwbfile=nwbfile)

I am kind of surprise that we don't have a test for this.

@h-mayorquin
Copy link
Collaborator

More comments:

  • The keyword should be implemented at the level of add_sorting and add_units_table so the same workflow can be achieved for those functions.

  • About the interfaceI also think that the keyword should be something like make_table_expandable because this is really what you sacrifce when you turn off this setting. Then, a warning can be thrown to the user when casting indicating that writing integers values is only achievable when writing non expandable tables and that they can set this of by make_table_expandable=False. There might be other limitations of making table expandable and it would be nice to switch all of them with only one keyword instead of having to add an extra keyword argument per situation. Plus, if at some point in the future we change hdmf to add missing integer values like pandas then cast_int_to_float stops making sense. A user friendly error should be produce in the opposite situation that someone did not cast and then they try to expand. Currently, ValueError: cannot convert float NaN to integer is obscure and not very useful.

@bendichter
Copy link
Contributor Author

bendichter commented Jul 18, 2024

@h-mayorquin thanks for looking into this.

As I asked here, how often does this use-case really come up where we'd have units that have different properties in the same Units table?

What if we catch this edge case, and recast to floats only when we need to and throw a warning indicating that we are doing this?

@bendichter
Copy link
Contributor Author

bendichter commented Jul 18, 2024

I think a more natural use-case for expanding a table would be to add rows that have columns that are consistent with the existing table. That is doable without casting to floats, so I think make_table_expandable would be a bit misleading. In the majority of cases the table is already expandable, since the columns are consistent. I guess I would change my mind if you could provide a common use-case that would require this, but I can't think of a single one where I would prefer stacking units with different columns rather than creating multiple Units tables.

@h-mayorquin
Copy link
Collaborator

The example I mentioned before was working on consensus algorithms: adding sorting results from different sorters with different properties (one of which is type int).

I think a more natural use-case for expanding a table would be to add rows that have columns that are consistent with the existing table. That is doable without casting to floats, so I think make_table_expandable would be a bit misleading

Yes, I thought about it: the table is expandable in many cases that don't require the casting. What do you think of ensure_expandable_table?

One thing that is not clear how to handle is compatbility across calls of add_units_table. Making this behavior a parameter creates an implicit coupling between the first call of add_units_table and any subsequent use which is a bad practice. Probably, we can check the type of the previous entries for the same property and just use that type. So, the parameter is really only respected on the first call to add_units_table.

A consideration that just made me change my mind to support the default that you want is that the current approach is loosy for int64. int32 can be safely cast to float 64 (the mantissa of the later has enough bytes) by doubling the memory cost but if the type is int64 we will be loosing information in most systems (asuming hdmf has the types of numpy). Anyway, we don't have the logic to handle any of those considerations so we are currently doing loosy conversions for int64. In other words the current behavior has a bug.

I really wish that we had a nullable value for int types but is unlikely that hdf would support that

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Jul 18, 2024

Yes, as I wrote above I changed my mind about the default. Given that we can guarantee lossless casting for in64 it is probably better to re-think the whole thing. So my current position is:

Let's not cast by default and produce error friendly messages when the need to have nullable values of int arises which is:

  • When a sorting or recording is added to an already existing table and there are missing properties.
  • When a sorting or recording is added for the first time but they have missing properties.

I probably think that focusing on the missing values and what to do with them boht as a use case an in the interface (the name of the keywords) is a more fruitful approach. The edge case is the need to have nullable ints and I think is easier to test, think and document that behavior.

Copy link

codecov bot commented Aug 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.14%. Comparing base (5986a72) to head (44e4488).
Report is 37 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #964   +/-   ##
=======================================
  Coverage   90.14%   90.14%           
=======================================
  Files         126      126           
  Lines        7211     7216    +5     
=======================================
+ Hits         6500     6505    +5     
  Misses        711      711           
Flag Coverage Δ
unittests 90.14% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...nterfaces/ecephys/basesortingextractorinterface.py 80.95% <100.00%> (+1.15%) ⬆️
...c/neuroconv/tools/spikeinterface/spikeinterface.py 83.43% <100.00%> (-0.04%) ⬇️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

recasting unit properties as floats
2 participants