Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quirks with subsampling within StateArrayDataset #11

Open
vinsfan368 opened this issue Mar 3, 2024 · 2 comments
Open

Quirks with subsampling within StateArrayDataset #11

vinsfan368 opened this issue Mar 3, 2024 · 2 comments

Comments

@vinsfan368
Copy link
Contributor

Hi @alecheckert, I ran across some quirks when trying to subsample within an SAD. Here's a code snippet:

import saspt
from saspt.dataset import StateArrayDataset
import os
import pandas as pd

module_path = os.path.dirname(os.path.dirname(saspt.__file__))
sample_csv = os.path.join(module_path, "examples", 
                          "u2os_ht_nls_7.48ms", "region_8_7ms_trajs.csv")
settings = dict(pixel_size_um=0.16,            
                frame_interval=0.00748,   
                focal_depth=0.7,              
                sample_size=10,    
                progress_bar=True,         
                likelihood_type='rbme',    
                splitsize=10,   
                start_frame=0)
paths = dict(filepath=[sample_csv for _ in range(3)],
             condition=['test' for _ in range(3)])
SAD = StateArrayDataset.from_kwargs(pd.DataFrame(paths),
                                    path_col='filepath',
                                    condition_col='condition',
                                    **settings)
print(f"Sum of unnormalized posterior probabilities per file:", 
      f"{SAD.posterior_occs.sum(axis=(1,2))}", 
      sep="\n")
print(f"SAD.jumps_per_file attr:", SAD.jumps_per_file, sep="\n")

The problem is that subsampling is happening every time StateArrayDataset._load_tracks is called. This can happen twice while using the object (unless the user clear()s): once when calculating occupancies and another to get processed track statistics. jumps_per_file depends on the processed track stats, so it doesn't agree with the posterior occs.

A solution could be to bundle StateArrayDataset._get_processed_track_statistics() and StateArrayDataset.calc_marginal_posterior_occs() into a bigger function. I guess the subsampled detections could also be cached on the SAD object, though that could take up a lot of memory.

Happy to try to fix this; let me know what you think is the best way forward.

@alecheckert
Copy link
Owner

Nice catch. I'd be in favor of your proposed solution (bundle the track statistics into calc_marginal_posterior_occs). Could you also add a test to catch this in the future?

@vinsfan368
Copy link
Contributor Author

Done per PR #12. Am I supposed to be able to assign reviewers? @alecheckert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants