Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running decoding_in_time yields different output every time #9

Closed
JensBlack opened this issue Jul 10, 2024 · 3 comments
Closed

Running decoding_in_time yields different output every time #9

JensBlack opened this issue Jul 10, 2024 · 3 comments

Comments

@JensBlack
Copy link

JensBlack commented Jul 10, 2024

Hi @lposani,

Thank you for your previous help with #8.

I have continued on my journey with Decodanda and tried the "decoding_in_time" function.

My questions to you:

  1. Is there a random component in the code? E.g. when creating folds?

I am assuming that all these plots show values from the same distribution of possible performances.

Just for reproducability reasons though, it would be good to be able to fix the random state. Is this possible?

  1. or is there something wrong with my approach? I tried to be exhaustive with the explanation, so this is easier to answer. Any insight would be appreciated.

See below for details...

The setup:

1 session of 1 freely moving mouse in an arena for approx. 1.5 h. The mouse explores and sleeps, hence the labels (awake and sleep, [0,1]) to adress this simple binary case. We recorded Ca2+ activity from a region in the brain (in this session 61 neurons at 20 Hz). The calcium activity is z-scored.

The dummy hypothesis:

Is the activity predictive of the state change between awake and sleep or vice versa?

Implementation:

Generating pseudo trials and time_from_onset

Each behavior onset is captured and used as the center of a pseudo trial to generate trials that are elligable for the calculation of "time_from_onset". For this calculation, I generate a range of numbers (negative, and positive) with each onset = 0 and the distance between two onsets split into a pre-onset period (- half to 0) and post-onset period (0 to half).
This should be in line with your example and the "time_from_onset" generated in the synthetic data.

grafik

This is a visualization of the above. The onsets (vertical grey lines) are at 0 (black horizontal line), while the trial pre-onset and post-onset times are ranging from negative to positive numbers (blue line).

There are a total of 76 state changes (onsets) with 38 per class (awake/sleep).

The distribution of pre-onset and post-offset timesteps:

grafik

The red-area indicates the time boundaries I am interested in (see code below).

Decodanda code

from decodanda import decode_in_time
#ref: https://github.com/lposani/decodanda/blob/master/notebooks/decoding_in_time.ipynb


sample_rate = 20 # Hz (calcium imaging)

time_from_onset_sec = time_from_onset


data = {
    "raster": ca_z # can be continuous or discrete (calcium imaging, or spikes)
    ,"trial": blocks_windows # trial number for each time point (hear pseudo trial)
    ,"sleep/awake": bin_sleep_labels
    , "time_from_onset": time_from_onset #attribute that contains negative and positive numbers around trial onset.
}

conditions = {
    "sleep/awake": [0,1]
}


decodanda_params = {
    'verbose': False,
}

decoding_params = {
    'training_fraction': 0.7,
    'nshuffles': 30,
    'cross_validations': 20,
}

perfs, null, time_points = decode_in_time(data, conditions,
                               time_attr='time_from_onset',
                               time_boundaries=[-3 *sample_rate, 3*sample_rate],
                               time_window=10,
                               decoding_params=decoding_params,
                               decodanda_params=decodanda_params,
                               plot=True,
                               time_key='Time from stimulus onset (timebins)')

The output of 5 different runs of the same code (none of the input variables changed):

grafik grafik grafik grafik grafik

Given that there is no legend or caption concerning the output of this function, my assumption is that the plot depicts the average performance of k-fold crossvalidated models (blue dots) across each time bin (blue line) vs the performance of shuffled labels (black line with grey area +/- SD).


Addendum (just for clarity):

Given your previous input, the decoding results look like this, for the same underlying data (with the previously suggested trial scheme of chunking each bout into pieces with 20 samples - i.e., 1 second of data)

grafik grafik
@JensBlack
Copy link
Author

Adding the output of the function with 'verbose = True':

0it [00:00, ?it/s]
		[Decodanda]	building conditioned rasters for session 0
			(sleep/awake = 0):	Selected 90 time bin out of 129882, divided into 9 trials - 63 neurons
			(sleep/awake = 1):	Selected 80 time bin out of 129882, divided into 8 trials - 63 neurons



Testing decoding performance for semantic dichotomy:  sleep/awake
[['1'], ['0']] 90

[decode_dichotomy]	Decoding - 90 time bins - 63 neurons - 1 brains
		(1)
			vs.
		(0)
100%|██████████| 20/20 [00:00<00:00, 277.77it/s]

Looping over decoding cross validation folds:

[decode_with_nullmodel]	 data <p> = 0.72

[decode_with_nullmodel]	Looping over null model shuffles.
100%|██████████| 30/30 [00:02<00:00, 12.65it/s]

This repeats for 12 times (12it)

@lposani
Copy link
Owner

lposani commented Jul 30, 2024

Hi @JensBlack, the decoding_in_time function is a bit under-documented at the moment, so I would consider it in the beta version. To your questions:

  1. yes, all decoding functions have built-in randomness in resampling the data for balancing and choosing the training and testing bins for cross-validation. Decodanda does not support an explicit random state as an input at the moment, but a workaround to fix the randomness that I use is to set the whole notebook/script random state by using np.random.seed(0) on top of the script.

  2. I can't see anything particularly wrong in your approach, and your understanding of the output of the function is correct. Maybe try to go deeper into the sleep state? I would probably filter out those bouts that are shorter than a threshold (e.g. 2 seconds), because having close-by time bins labeled with different labels can give you false negative results (imagine having the same long activation event that is half labeled as awake and half labeled as sleep, it would damage the decoding performance). See e.g. those very quick sleep/awake transitions around time bin 75,000 in your plot.

  3. in your sleep-velocity data, how can you balance sleep and velocity? I can't imagine mice running while sleeping 🤔
    either way, I would probably use the whole individual sleep/awake bouts as pseudo-trials in your analysis (by labeling each of them with an individual trial value), so you avoid even misusing the contiguous chunk of sleep in the same event (e.g., the very long sleep event at time bin ~50,000), and filter out short bouts as discussed above. But all these are very practical choices, and there is no way around experimenting with the decoding functions, as you are already doing!

good luck with your decoding journey!

@JensBlack
Copy link
Author

Thanks a bunch! I really appreciate your input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants