Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing motion correction displacements efficiently #228

Open
bendichter opened this issue Nov 20, 2018 · 10 comments
Open

Storing motion correction displacements efficiently #228

bendichter opened this issue Nov 20, 2018 · 10 comments
Assignees
Labels
category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users
Milestone

Comments

@bendichter
Copy link
Contributor

bendichter commented Nov 20, 2018

I am helping the Losonczy Lab convert their calcium imaging data and I am having trouble with storing motion corrected images. Their software, SIMA, never actually stores the motion corrected images, but instead stores the displacements and computes the corrected images on the fly:

https://github.com/losonczylab/sima/blob/4bc7de8f5d3e520faa92d677fb8fb57515b3d2d2/sima/sequence.py#L1068-L1088

I actually think this is a better approach, because it takes up much less space. The displacement array is very sparse, and when it does have values they are ints and are repeated, so gzip will be very effective. As shown above, the code to apply this alignment is simple and we can make it even simpler by requiring the shape of the displacement array to be the same shape as the data array. NWB does have the ability to store this data in ophys.CorrectedImageStack.xy_translation, however this object also requires the original ImageSeries (which I suppose can be linked) and the corrected ImageSeries (which cannot). This is a huge data overhead that it seems silly to require. It would necessitate doubling the size of data just to communicate how frames are shifted.

Here is how I recommend we fix this in a minimal, non-breaking way:

  • Make the corrected field of CorrectedImageStack optional
  • change the name of xy_translation to just translation, to accommodate 3D data, and require that the size is the same size as the original ImageStack.
@bendichter
Copy link
Contributor Author

@ageorgou

@ageorgou
Copy link

Hi @bendichter. Did you mean to tag me? I've no connection to the Losonczy Lab, I'm afraid! (that said, your solution seems reasonable...)

@bendichter
Copy link
Contributor Author

bendichter commented Nov 20, 2018

@ageorgou Yes I did, you work with Ca imaging, right? Do you have the same issue? I was hoping to see if this would work across multiple labs. How do you store motion correction data?

@ageorgou
Copy link

ageorgou commented Nov 20, 2018

The lab I'm working with have some imaging data, but we don't do motion correction as far as I'm aware. I'll ask around and report back if I'm forgetting something.

EDIT: Motion correction is currently handled by the capturing software, so we don't do or record any processing - sorry I can't offer any input but thanks for asking!

@oruebel
Copy link
Contributor

oruebel commented Nov 20, 2018

@nclack can you have a look at this proposed schema change. If I remember correctly ScanImage was also dealing with this (or maybe I'm wrong).

@bendichter
Copy link
Contributor Author

bendichter commented Jan 8, 2019

@ageorgou

EDIT: Motion correction is currently handled by the capturing software, so we don't do or record any processing - sorry I can't offer any input but thanks for asking!

That is useful input :-)

So where are we on this? Can we make this field optional? It's currently a roadblock for one of my datasets

@bendichter
Copy link
Contributor Author

@nicain, I believe you are also dealing with this issue?

@nicain
Copy link
Contributor

nicain commented Apr 4, 2019

Yes; I am having two problems:

  1. We publish neither the "original" nor the "corrected" ImageSeries. This is because the size of the resulting file would be far too much for us to serve practically. Instead, we serve these data separately via S3. Because of this, my request would be that "original" and "corrected" be optional.

That being said, I can imagine the following reasonable push-back: Is it really an "image stack" if you havent provided the images? I think not. So an alternative would be to accept some sort of URI (file system path, or web URL) to redirect the interested user to the resource another way. In this case, I would still request "original" be optional, so long as the translation information used to compute the "corrected" is required. But then again, others will reasonably argue that the "original" be required, so long as the translation information needed to motion-correct this original source is required. Between these two options ("original" required but not "corrected", vs. "corrected" required but not "original"), ultimately, I think it is a philosophical debate that needs to be resolved: Is an NWB file a record of the experiment (and so "original" required and "corrected" not), or is an NWB file a record of a finished product including the provenance needed to compute that final state (and so "corrected" required and "original" not). The only way to satisfy both viewpoints is to make BOTH optional, which is my final recommendation (and will support me use-case the best).

  1. xy_translation is supposed to be timeseries, but is really should be two separate timeseries. Of course one can make a data attribute on a single timeseries that is 2xN (N is the number of timestamps), but then there is no way to say whether the 1st "row" is "x" and the 2nd "row" is "y", or vice-versa. My recommendation is to split this into attributes on CorrectedImageStack: x_translation and y_translation. I strongly recommend this approach.

tldr; make "corrected" optional, "original" optional, and split xy_translation into x_translation and y_translation

Sorry for the novel, and thanks for all the hard work!

@oruebel
Copy link
Contributor

oruebel commented Apr 4, 2019

So an alternative would be to accept some sort of URI (file system path, or web URL) to redirect the interested user to the resource another way

This is something we have on the agenda as part of the NIH grant. However, I'd expect this to be more on the 6-12month timescale. Doing this is non-trival because you have to standardize a couple of components here: 1) how you store this sort of foreign field in the file, 2) what you support for transfer of the data (e.g., do you ship it as JSON, HDF5, BSON etc.), 3) how does the web-API look like for evaluating the requests etc, .

@nclack nclack pinned this issue May 16, 2019
@nclack
Copy link

nclack commented May 16, 2019

@oruebel I realize I'm getting on this a little late but I can chime in wrt ScanImage metadata. We store possibly-3d translations with respect to a reference stack/image. The image/volume data we store is uncorrected. We use the motion estimates to stabilize targeting etc.

@rly rly added this to the Future milestone Nov 15, 2019
@oruebel oruebel unpinned this issue Nov 15, 2019
@stephprince stephprince added category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users labels May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users
Projects
None yet
Development

No branches or pull requests

7 participants