Releases: donishadsmith/neurocaps
0.16.2
[0.16.2] - 2024-08-22
- Transition probabilities has been added to
CAP.calculate_metrics
. Below is a snippet from the codebase
of how the calculation is done.
if "transition_probability" in metrics:
temp_dict[group].loc[len(temp_dict[group])] = [subj_id, group, curr_run] + [0.0]*(temp_dict[group].shape[-1]-3)
# Get number of transitions
trans_dict = {target: np.sum(np.where(predicted_subject_timeseries[subj_id][curr_run][:-1] == target, 1, 0))
for target in group_caps[group]}
indx = temp_dict[group].index[-1]
# Iterate through products and calculate all symmetric pairs/off-diagonals
for prod in products_unique[group]:
target1, target2 = prod[0], prod[1]
trans_array = predicted_subject_timeseries[subj_id][curr_run].copy()
# Set all values not equal to target1 or target2 to zero
trans_array[(trans_array != target1) & (trans_array != target2)] = 0
trans_array[np.where(trans_array == target1)] = 1
trans_array[np.where(trans_array == target2)] = 3
# 2 indicates forward transition target1 -> target2; -2 means reverse/backward transition target2 -> target1
diff_array = np.diff(trans_array,n=1)
# Avoid division by zero errors and calculate both the forward and reverse transition
if trans_dict[target1] != 0:
temp_dict[group].loc[indx,f"{target1}.{target2}"] = float(np.sum(np.where(diff_array==2,1,0))/trans_dict[target1])
if trans_dict[target2] != 0:
temp_dict[group].loc[indx,f"{target2}.{target1}"] = float(np.sum(np.where(diff_array==-2,1,0))/trans_dict[target2])
# Calculate the probability for the self transitions/diagonals
for target in group_caps[group]:
if trans_dict[target] == 0: continue
# Will include the {target}.{target} column, but the value is initially set to zero
columns = temp_dict[group].filter(regex=fr"^{target}\.").columns.tolist()
cumulative = temp_dict[group].loc[indx,columns].values.sum()
temp_dict[group].loc[indx,f"{target}.{target}"] = 1.0 - cumulative
Below is a simplified version of the above snippet.
import itertools, math, pandas as pd, numpy as np
groups = [["101","A","1"], ["102","B","1"]]
timeseries_dict = {
"101": np.array([1,1,1,1,2,2,1,4,3,5,3,3,5,5,6,7]),
"102": np.array([1,2,1,1,3,3,1,4,3,5,3,3,4,5,6,8,7])
}
caps = list(range(1,9))
# Get all combinations of transitions
products = list(itertools.product(caps,caps))
df = pd.DataFrame(columns=["Subject_ID", "Group","Run"]+[f"{x}.{y}" for x,y in products])
# Filter out all reversed products and products with the self transitions
products_unique = []
for prod in products:
if prod[0] == prod[1]: continue
# Include only the first instance of symmetric pairs
if (prod[1],prod[0]) not in products_unique: products_unique.append(prod)
for info in groups:
df.loc[len(df)] = info + [0.0]*(df.shape[-1]-3)
timeseries = timeseries_dict[info[0]]
# Get number of transitions
trans_dict = {target: np.sum(np.where(timeseries[:-1] == target, 1, 0)) for target in caps}
indx = df.index[-1]
# Iterate through products and calculate all symmetric pairs/off-diagonals
for prod in products_unique:
target1, target2 = prod[0], prod[1]
trans_array = timeseries.copy()
# Set all values not equal to target1 or target2 to zero
trans_array[(trans_array != target1) & (trans_array != target2)] = 0
trans_array[np.where(trans_array == target1)] = 1
trans_array[np.where(trans_array == target2)] = 3
# 2 indicates forward transition target1 -> target2; -2 means reverse/backward transition target2 -> target1
diff_array = np.diff(trans_array,n=1)
# Avoid division by zero errors and calculate both the forward and reverse transition
if trans_dict[target1] != 0:
df.loc[indx,f"{target1}.{target2}"] = float(np.sum(np.where(diff_array==2,1,0))/trans_dict[target1])
if trans_dict[target2] != 0:
df.loc[indx,f"{target2}.{target1}"] = float(np.sum(np.where(diff_array==-2,1,0))/trans_dict[target2])
# Calculate the probability for the self transitions/diagonals
for target in caps:
if trans_dict[target] == 0: continue
# Will include the {target}.{target} column, but the value is initially set to zero
columns = df.filter(regex=fr"^{target}\.").columns.tolist()
cumulative = df.loc[indx,columns].values.sum()
df.loc[indx,f"{target}.{target}"] = 1.0 - cumulative
- Added new external function -
transition_matrix
, which generates and visualizes the average transition probabilities
for all groups, using the transition probability dataframe outputted byCAP.calculate_metrics
0.16.1
♻ Changed
- For
knn_dict
, cKdtree is replaced with Kdtree and scipy is restricted to 1.6.0 or later since that is the version
were Kdtree used the C implementation. TimeseriesExtractor.get_bold()
can now be used on Windows, pybids still does not install by default to prevent
long path error butpip install neurocaps[windows]
can be used for installation.- All instances of textwrap replaced with normal strings, printed warnings or messages will be longer in length now
and occupies less vertical screen space.
0.16.0
♻ Changed
- In
CAP.caps2surf
, thesave_stat_map
parameter has been changed tosave_stat_maps
. - Slight improvements in a few errors/exceptions to improve their informativeness.
- Now, when a subject's run is excluded due to exceeding the fd threshold, the percentage of their volumes
exceeding the threshold is given as opposed to simply stating that they have been excluded.
🐛 Fixes
- Fix a specific instance when
tr
is not specified forTimseriesExtractor.get_bold
. When thetr
is not specified,
the code attempts to check the the bold metadata/json file in the derivatives directory to extract the
repetition time. Now, it will check for this file in both the derivatives and root bids dir. The code will also
raise an error earlier if the tr isn't specified, cannot be extracted from the bold metadata file, and bandpass filtering
is requested. - A warning check that is done to assess if indices for a certain condition is outside a possible range due to
duration mismatch, incorrect tr, etc is now also done before calculating the percentage of volumes exceeding the threshold
to not dilute calculations. Before this check was only done before extracting the condition from the timeseries array.
💻 Metadata
- Very minor documentation updates for
TimseriesExtractor.get_bold
.
0.15.2
[0.15.2] - 2024-07-23
♻ Changed
- Created a specific message when dummy_scans = {"auto": True} and zero "non_steady_state_outlier_XX" are found
whenverbose=True
. - Regardless if
parcel_approach
, whether used as a setter or input, accepts pickles.
🐛 Fixes
Fixed a reference before assignment issue in merge_dicts
. This occurred when only the merged dictionary was requested
to be saved without saving the reduced dictionaries, and no user-provided file_names were given. In this scenario,
the default name for the merged dictionary is now correctly used.
[0.15.1] - 2024-07-23
🚀 New/Added
- In
TimeseriesExtractor
, "min" and "max" sub-keys can now be used whendummy_scans
is a dictionary and the
"auto" sub-key is True. The "min" sub-key is used to set the minimum dummy scans to remove if the number of
"non_steady_state_outlier_XX" columns detected is less than this value and the "max" sub-key is used to set the
maximum number of dummy scans to remove if the number of "non_steady_state_outlier_XX" columns detected exceeds this
value.
[0.15.0] - 2024-07-21
🚀 New/Added
save_reduced_dicts
parameter tomerge_dicts
so that the reduced dictionaries can also be saved instead of only
being returned.
♻ Changed
-
Some parameter names, inputs, and outputs for non-class functions -
merge_dicts
,change_dtypes
, andstandardize
have changed to improve consistency across these functions.merge_dicts
return_combined_dict
has been changed toreturn_merged_dict
.file_name
has been changed tofile_names
since the reduced dicts can also be saved now.- Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
standardize
&change_dtypes
subject_timeseries
has been changed tosubject_timeseries_list
, the same as inmerge_dicts
.file_name
has been changed tofile_names
.return_dict
has been changed toreturn_dicts
.
-
The returned dictionary for
merge_dicts
,change_dtypes
, andstandardize
is only
dict[str, dict[str, dict[str, np.ndarray]]]
now. -
In
CAP.calculate_metrics
, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
import or use numpy operations to reduce needed to create the same calculation.-
"counts"
- Previous Code:
# Get frequency frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run])) # Sort the keys sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))} # Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP if len(sorted_frequency_dict) != len(cap_numbers): sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in list(sorted_frequency_dict) else 0 for cap_number in cap_numbers} # Replace zeros with nan for groups with less caps than the group with the max caps if len(cap_numbers) > group_cap_counts[group]: sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number <= group_cap_counts[group] else float("nan") for cap_number in cap_numbers}
- Refactored Code:
# Get frequency; frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum() for key in range(1, group_cap_counts[group] + 1)} # Replace zeros with nan for groups with less caps than the group with the max caps if max(cap_numbers) > group_cap_counts[group]: for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})
-
"temporal_fraction"
- Previous Code:
proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run])) for key, item in sorted_frequency_dict.items()}
- "Refactored Code": Nothing other than some parameter names have changed.
proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run])) for key, value in frequency_dict.items()}
-
"persistence"
- Previous Code:
# Initialize variable persistence_dict = {} uninterrupted_volumes = [] count = 0 # Iterate through caps for target in cap_numbers: # Iterate through each element and count uninterrupted volumes that equal target for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])): if predicted_subject_timeseries[subj_id][curr_run][index] == target: count +=1 # Store count in list if interrupted and not zero else: if count != 0: uninterrupted_volumes.append(count) # Reset counter count = 0 # In the event, a participant only occupies one CAP and to ensure final counts are added if count > 0: uninterrupted_volumes.append(count) # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide if len(uninterrupted_volumes) > 0: persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes) if tr: persistence_dict.update({target: persistence_value*tr}) else: persistence_dict.update({target: persistence_value}) else: # Zero indicates that a participant has zero instances of the CAP persistence_dict.update({target: 0}) # Reset variables count = 0 uninterrupted_volumes = [] # Replace zeros with nan for groups with less caps than the group with the max caps if len(cap_numbers) > group_cap_counts[group]: persistence_dict = {cap_number: persistence_dict[cap_number] if cap_number <= group_cap_counts[group] else float("nan") for cap_number in cap_numbers}
- Refactored Code:
# Initialize variable persistence_dict = {} # Iterate through caps for target in cap_numbers: # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0] binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0) # Get indices of values that equal 1; [0,2,3,4] target_indices = np.where(binary_arr == 1)[0] # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1]; # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2 segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1 # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)}) # Replace zeros with nan for groups with less caps than the group with the max caps if max(cap_numbers) > group_cap_counts[group]: for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})
-
"transition_frequency"
- Previous Code:
count = 0 # Iterate through predicted values for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])): if index != 0: # If the subsequent element does not equal the previous element, this is considered a transition if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]: count +=1 # Populate DataFrame new_row = [subj_id, group_name, curr_run, count] df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row
- Refactored Code:
# Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation # for values not zero is [1,1,0,0,1] = 3 transitions transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()
Note, the
n
parameter innp.diff
defaults to 1, and differences are calculated asout[i] = a[i+1] - a[i]
-
0.15.1
[0.15.1] - 2024-07-23
🚀 New/Added
- In
TimeseriesExtractor
, "min" and "max" sub-keys can now be used whendummy_scans
is a dictionary and the
"auto" sub-key is True. The "min" sub-key is used to set the minimum dummy scans to remove if the number of
"non_steady_state_outlier_XX" columns detected is less than this value and the "max" sub-key is used to set the
maximum number of dummy scans to remove if the number of "non_steady_state_outlier_XX" columns detected exceeds this
value.
[0.15.0] - 2024-07-21
🚀 New/Added
save_reduced_dicts
parameter tomerge_dicts
so that the reduced dictionaries can also be saved instead of only
being returned.
♻ Changed
-
Some parameter names, inputs, and outputs for non-class functions -
merge_dicts
,change_dtypes
, andstandardize
have changed to improve consistency across these functions.merge_dicts
return_combined_dict
has been changed toreturn_merged_dict
.file_name
has been changed tofile_names
since the reduced dicts can also be saved now.- Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
standardize
&change_dtypes
subject_timeseries
has been changed tosubject_timeseries_list
, the same as inmerge_dicts
.file_name
has been changed tofile_names
.return_dict
has been changed toreturn_dicts
.
-
The returned dictionary for
merge_dicts
,change_dtypes
, andstandardize
is only
dict[str, dict[str, dict[str, np.ndarray]]]
now. -
In
CAP.calculate_metrics
, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
import or use numpy operations to reduce needed to create the same calculation.-
"counts"
- Previous Code:
# Get frequency frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run])) # Sort the keys sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))} # Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP if len(sorted_frequency_dict) != len(cap_numbers): sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in list(sorted_frequency_dict) else 0 for cap_number in cap_numbers} # Replace zeros with nan for groups with less caps than the group with the max caps if len(cap_numbers) > group_cap_counts[group]: sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number <= group_cap_counts[group] else float("nan") for cap_number in cap_numbers}
- Refactored Code:
# Get frequency; frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum() for key in range(1, group_cap_counts[group] + 1)} # Replace zeros with nan for groups with less caps than the group with the max caps if max(cap_numbers) > group_cap_counts[group]: for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})
-
"temporal_fraction"
- Previous Code:
proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run])) for key, item in sorted_frequency_dict.items()}
- "Refactored Code": Nothing other than some parameter names have changed.
proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run])) for key, value in frequency_dict.items()}
-
"persistence"
- Previous Code:
# Initialize variable persistence_dict = {} uninterrupted_volumes = [] count = 0 # Iterate through caps for target in cap_numbers: # Iterate through each element and count uninterrupted volumes that equal target for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])): if predicted_subject_timeseries[subj_id][curr_run][index] == target: count +=1 # Store count in list if interrupted and not zero else: if count != 0: uninterrupted_volumes.append(count) # Reset counter count = 0 # In the event, a participant only occupies one CAP and to ensure final counts are added if count > 0: uninterrupted_volumes.append(count) # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide if len(uninterrupted_volumes) > 0: persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes) if tr: persistence_dict.update({target: persistence_value*tr}) else: persistence_dict.update({target: persistence_value}) else: # Zero indicates that a participant has zero instances of the CAP persistence_dict.update({target: 0}) # Reset variables count = 0 uninterrupted_volumes = [] # Replace zeros with nan for groups with less caps than the group with the max caps if len(cap_numbers) > group_cap_counts[group]: persistence_dict = {cap_number: persistence_dict[cap_number] if cap_number <= group_cap_counts[group] else float("nan") for cap_number in cap_numbers}
- Refactored Code:
# Initialize variable persistence_dict = {} # Iterate through caps for target in cap_numbers: # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0] binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0) # Get indices of values that equal 1; [0,2,3,4] target_indices = np.where(binary_arr == 1)[0] # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1]; # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2 segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1 # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)}) # Replace zeros with nan for groups with less caps than the group with the max caps if max(cap_numbers) > group_cap_counts[group]: for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})
-
"transition_frequency"
- Previous Code:
count = 0 # Iterate through predicted values for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])): if index != 0: # If the subsequent element does not equal the previous element, this is considered a transition if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]: count +=1 # Populate DataFrame new_row = [subj_id, group_name, curr_run, count] df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row
- Refactored Code:
# Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation # for values not zero is [1,1,0,0,1] = 3 transitions transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()
Note, the
n
parameter innp.diff
defaults to 1, and differences are calculated asout[i] = a[i+1] - a[i]
-
🐛 Fixes
- When a pickle file was used as input in
standardize
orchange_dtype
an error was produced, this has been fixed
and these functions accept a list of dictionaries or a list of pickle files now.
💻 Metadata
- In the documentation for
CAP.caps2corr
it is now explicitly stated that the type of correlation being used is
Pearson correlation.
0.15.0
🚀 New/Added
save_reduced_dicts
parameter tomerge_dicts
so that the reduced dictionaries can also be saved instead of only
being returned.
♻ Changed
-
Some parameter names, inputs, and outputs for non-class functions -
merge_dicts
,change_dtypes
, andstandardize
have changed to improve consistency across these functions.merge_dicts
return_combined_dict
has been changed toreturn_merged_dict
.file_name
has been changed tofile_names
since the reduced dicts can also be saved now.- Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
standardize
&change_dtypes
subject_timeseries
has been changed tosubject_timeseries_list
, the same as inmerge_dicts
.file_name
has been changed tofile_names
.return_dict
has been changed toreturn_dicts
.
-
The returned dictionary for
merge_dicts
,change_dtypes
, andstandardize
is only
dict[str, dict[str, dict[str, np.ndarray]]]
now. -
In
CAP.calculate_metrics
, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
import or use numpy operations to reduce needed to create the same calculation.-
"counts"
- Previous Code:
# Get frequency frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run])) # Sort the keys sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))} # Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP if len(sorted_frequency_dict) != len(cap_numbers): sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in list(sorted_frequency_dict) else 0 for cap_number in cap_numbers} # Replace zeros with nan for groups with less caps than the group with the max caps if len(cap_numbers) > group_cap_counts[group]: sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number <= group_cap_counts[group] else float("nan") for cap_number in cap_numbers}
- Refactored Code:
# Get frequency; frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum() for key in range(1, group_cap_counts[group] + 1)} # Replace zeros with nan for groups with less caps than the group with the max caps if max(cap_numbers) > group_cap_counts[group]: for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})
-
"temporal_fraction"
- Previous Code:
proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run])) for key, item in sorted_frequency_dict.items()}
- "Refactored Code": Nothing other than some parameter names have changed.
proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run])) for key, value in frequency_dict.items()}
-
"persistence"
- Previous Code:
# Initialize variable persistence_dict = {} uninterrupted_volumes = [] count = 0 # Iterate through caps for target in cap_numbers: # Iterate through each element and count uninterrupted volumes that equal target for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])): if predicted_subject_timeseries[subj_id][curr_run][index] == target: count +=1 # Store count in list if interrupted and not zero else: if count != 0: uninterrupted_volumes.append(count) # Reset counter count = 0 # In the event, a participant only occupies one CAP and to ensure final counts are added if count > 0: uninterrupted_volumes.append(count) # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide if len(uninterrupted_volumes) > 0: persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes) if tr: persistence_dict.update({target: persistence_value*tr}) else: persistence_dict.update({target: persistence_value}) else: # Zero indicates that a participant has zero instances of the CAP persistence_dict.update({target: 0}) # Reset variables count = 0 uninterrupted_volumes = [] # Replace zeros with nan for groups with less caps than the group with the max caps if len(cap_numbers) > group_cap_counts[group]: persistence_dict = {cap_number: persistence_dict[cap_number] if cap_number <= group_cap_counts[group] else float("nan") for cap_number in cap_numbers}
- Refactored Code:
# Initialize variable persistence_dict = {} # Iterate through caps for target in cap_numbers: # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0] binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0) # Get indices of values that equal 1; [0,2,3,4] target_indices = np.where(binary_arr == 1)[0] # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1]; # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2 segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1 # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)}) # Replace zeros with nan for groups with less caps than the group with the max caps if max(cap_numbers) > group_cap_counts[group]: for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})
-
"transition_frequency"
- Previous Code:
count = 0 # Iterate through predicted values for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])): if index != 0: # If the subsequent element does not equal the previous element, this is considered a transition if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]: count +=1 # Populate DataFrame new_row = [subj_id, group_name, curr_run, count] df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row
- Refactored Code:
# Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation # for values not zero is [1,1,0,0,1] = 3 transitions transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()
-
🐛 Fixes
- When a pickle file was used as input in
standardize
orchange_dtype
an error was produced, this has been fixed
and these functions accept a list of dictionaries or a list of pickle files now.
💻 Metadata
- In the documentation for
CAP.caps2corr
it is now explicitly stated that the type of correlation being used is
Pearson correlation.
0.14.7
[0.14.7] - 2024-07-17
♻ Changed
- Improved Warning Messages and Print Statements:
-
In TimeseriesExtractor.get_bold, the subject-specific information output has been reformatted for better readability:
- Previous Format:
Subject: 1; run:1 - Message
- New Format:
[SUBJECT: 1 | SESSION: 1 | TASK: rest | RUN: 1] ----------------------------------------------- Message
-
In
CAP
class numerous warnings and statements have been changed to improve clarity:- Previous Format:
Optimal cluster size using silhouette method for A is 2.
- New Format:
[GROUP: A | METHOD: silhouette] - Optimal cluster size is 2.
-
These changes should improve clarity when viewing in a terminal or when redirected to an output file by SLURM.
-
Language in many statements and warnings have also been improved.
-
[0.14.6] - 2024-07-16
🐛 Fixes
- For
CAP.get_caps
, whencluster_selection_method
was used to find the optimal cluster size, the model would be
re-estimated and stored in theself.kmeans
property for later use. Previously, the internal function that generated the
model using scikit'sKMeans
only returned the performance metrics. These metrics for each cluster size were assessed,
and the best cluster size was used to generate the optimal KMeans model with the same parameters. This is fine when
settingrandom_seed
with the same k since the model would produce the same initial cluster centroids and produces similar
clustering solution regardless of the number of times the model is re-generated. However, if a random seed was not used,
the newly re-generated optimal model would technically differ despite having the same k, due to the random nature of KMeans
when initializing the cluster centroids. Now, the internal function returns both the performance metrics and the models,
ensuring the exact same model that was assessed is stored in theself.kmeans
. Shouldn't be an incredibly major issue
if your models are generally stable and produce similar cluster solutions. Though when not using a random seed, even
minor differences in the kmeans model even when using the same k can produce some statistical differences. Ultimately,
it is always best to ensure that the same model that the same model used for assessment and for later analyses are the
same to ensure robust results.
0.14.6
🐛 Fixes
- For
CAP.get_caps
, whencluster_selection_method
was used to find the optimal cluster size, the model would be
re-estimated and stored in theself.kmeans
property for later use. Previously, the internal function that generated the
model using scikit'sKMeans
only returned the performance metrics. These metrics for each cluster size were assessed,
and the best cluster size was used to generate the optimal KMeans model with the same parameters. This is fine when
settingrandom_seed
with the same k since the model would produce the same initial cluster centroids and produces similar
clustering solution regardless of the number of times the model is re-generated. However, if a random seed was not used,
the newly re-generated optimal model would technically differ despite having the same k, due to the random nature of KMeans
when initializing the cluster centroids. Now, the internal function returns both the performance metrics and the models,
ensuring the exact same model that was assessed is stored in theself.kmeans
. Shouldn't be an incredibly major issue
if your models are generally stable and produce similar cluster solutions. Though when not using a random seed, even
minor differences in the kmeans model even when using the same k can produce some statistical differences. Ultimately,
it is always best to ensure that the same model that the same model used for assessment and for later analyses are the
same to ensure robust results.
0.14.5
♻ Changed
- In
TimeseriesExtractor
,dummy_scans
can now be a dictionary that uses the "auto" sub-key If "auto" is set to
True, the number of dummy scans removed depend on the number of "non_steady_state_outlier_XX" columns in the
participants fMRIPrep confounds tsv file. For instance, if there are two "non_steady_state_outlier_XX" columns
detected, thendummy_scans
is set to two since there is one "non_steady_state_outlier_XX" per outlier volume for
fMRIPrep. This is assessed for each run of all participants sodummy_scans
depends on the number number of
"non_steady_state_outlier_XX" in the confound file associated with the specific participant, task, and run number.
🐛 Fixes
- For defensive programming purposes, instead of assuming the timing information in the event file perfectly
coincides with the timeseries. When a condition is specified and onset and duration must be used to extract the
indices corresponding to the condition of interest, the max scan index is checked to see if it exceeds the length of
the timeseries. If this condition is met, a warning is issued in the event of timing misalignment (i.e errors in event
file, incorrect repetition time, etc) and invalid indices are ignored to only extract the valid indices from the timeseries.
This is done in the event this was that are greater than the timeseries shape are ignored.
0.14.4
[0.14.4] - 2024-07-15
♻ Changed
- Minor update that prints the optimal cluster size for each group when using
cluster_selection_method
in
CAP.get_caps()
. Just for information purposes. - Previously version 0.14.3.post1
[0.14.3.post1] - YANKED
♻ Changed
-
Minor update that prints the optimal cluster size for each group when using
cluster_selection_method
in
CAP.get_caps()
. Just for information purposes. -
Yanked due to not being a metadata update, this should be a patch update to denote a behavioral change,
this is now version 0.14.4 to adhere a bit better to versioning practices.
[0.14.3] - 2024-07-14
- Thought of some minor changes.
♻ Changed
- Added new warning if
fd_threshold
is specified butuse_confounds
is False sincefd_threshold
needs the confound
file from fMRIPrep. In previous version, censoring just didn't occur and never issued a warning. - Changed the error exception types for cosine similarity in
CAP.caps2radar
from ValueError to ZeroDivisionError - Added ValueError in
TimeseriesExtractor.visualize_bold
if bothregion
androi_indx
is None. - In
TimeseriesExtractor.visualize_bold
ifroi_indx
is a string, int, or list with a single element, a title is
added to the plot.
[0.14.2.post2] - 2024-07-14
💻 Metadata
- Simply wanted the latest metadata update to be on Zenodo and to have the same DOI as I forgot to upload
version 0.14.2.post1 there.
[0.14.2.post1] - 2024-07-14
💻 Metadata
- Updated a warning during timeseries extraction that only included a partial reason for why the indices for condition
have been filtered out. Added information aboutfd_threshold
being the reason why.
[0.14.2] - 2024-07-14
♻ Changed
- Implemented a minor code refactoring that allows runs flagged due to "outlier_percentage", runs were all volumes will
be scrubbed due to all volumes exceeding the threshold for framewise displacement, and runs were the specified condition
returns zero indices will not undergo timeseries extraction. - Also clarified the language in a warning that occurs when all NifTI files have been excluded or missing for a subject.
🐛 Fixes
- If a condition does not exist in the event file, a warning will be issued if this occurs. This should prevent empty
timeseries or errors. In the warning the condition will be named in the event of a spelling error. - Added specific error type to except blocks for the cosine similarities that cause a division by zero error.