Releases · donishadsmith/neurocaps

23 Aug 00:53

donishadsmith

0.16.2

2bf710e

0.16.2 Latest

Latest

[0.16.2] - 2024-08-22

Transition probabilities has been added to CAP.calculate_metrics. Below is a snippet from the codebase
of how the calculation is done.

    if "transition_probability" in metrics:
        temp_dict[group].loc[len(temp_dict[group])] = [subj_id, group, curr_run] + [0.0]*(temp_dict[group].shape[-1]-3)
        # Get number of transitions
        trans_dict = {target: np.sum(np.where(predicted_subject_timeseries[subj_id][curr_run][:-1] == target, 1, 0))
                        for target in group_caps[group]}
        indx = temp_dict[group].index[-1]
        # Iterate through products and calculate all symmetric pairs/off-diagonals
        for prod in products_unique[group]:
            target1, target2 = prod[0], prod[1]
            trans_array = predicted_subject_timeseries[subj_id][curr_run].copy()
            # Set all values not equal to target1 or target2 to zero
            trans_array[(trans_array != target1) & (trans_array != target2)] = 0
            trans_array[np.where(trans_array == target1)] = 1
            trans_array[np.where(trans_array == target2)] = 3
            # 2 indicates forward transition target1 -> target2; -2 means reverse/backward transition target2 -> target1
            diff_array = np.diff(trans_array,n=1)
            # Avoid division by zero errors and calculate both the forward and reverse transition
            if trans_dict[target1] != 0:
                temp_dict[group].loc[indx,f"{target1}.{target2}"] = float(np.sum(np.where(diff_array==2,1,0))/trans_dict[target1])
            if trans_dict[target2] != 0:
                temp_dict[group].loc[indx,f"{target2}.{target1}"] = float(np.sum(np.where(diff_array==-2,1,0))/trans_dict[target2])

        # Calculate the probability for the self transitions/diagonals
        for target in group_caps[group]:
            if trans_dict[target] == 0: continue
            # Will include the {target}.{target} column, but the value is initially set to zero
            columns = temp_dict[group].filter(regex=fr"^{target}\.").columns.tolist()
            cumulative = temp_dict[group].loc[indx,columns].values.sum()
            temp_dict[group].loc[indx,f"{target}.{target}"] = 1.0 - cumulative

Below is a simplified version of the above snippet.

    import itertools, math, pandas as pd, numpy as np
    groups = [["101","A","1"], ["102","B","1"]]
    timeseries_dict = {
        "101": np.array([1,1,1,1,2,2,1,4,3,5,3,3,5,5,6,7]),
        "102": np.array([1,2,1,1,3,3,1,4,3,5,3,3,4,5,6,8,7])
    }
    caps = list(range(1,9))
    # Get all combinations of transitions
    products = list(itertools.product(caps,caps))
    df = pd.DataFrame(columns=["Subject_ID", "Group","Run"]+[f"{x}.{y}" for x,y in products])
    # Filter out all reversed products and products with the self transitions
    products_unique = []
    for prod in products:
        if prod[0] == prod[1]: continue
        # Include only the first instance of symmetric pairs
        if (prod[1],prod[0]) not in products_unique: products_unique.append(prod)

    for info in groups:
        df.loc[len(df)] = info + [0.0]*(df.shape[-1]-3)
        timeseries = timeseries_dict[info[0]]
        # Get number of transitions
        trans_dict = {target: np.sum(np.where(timeseries[:-1] == target, 1, 0)) for target in caps}
        indx = df.index[-1]
        # Iterate through products and calculate all symmetric pairs/off-diagonals
        for prod in products_unique:
            target1, target2 = prod[0], prod[1]
            trans_array = timeseries.copy()
            # Set all values not equal to target1 or target2 to zero
            trans_array[(trans_array != target1) & (trans_array != target2)] = 0
            trans_array[np.where(trans_array == target1)] = 1
            trans_array[np.where(trans_array == target2)] = 3
            # 2 indicates forward transition target1 -> target2; -2 means reverse/backward transition target2 -> target1
            diff_array = np.diff(trans_array,n=1)
            # Avoid division by zero errors and calculate both the forward and reverse transition
            if trans_dict[target1] != 0:
                df.loc[indx,f"{target1}.{target2}"] = float(np.sum(np.where(diff_array==2,1,0))/trans_dict[target1])
            if trans_dict[target2] != 0:
                df.loc[indx,f"{target2}.{target1}"] = float(np.sum(np.where(diff_array==-2,1,0))/trans_dict[target2])
        
        # Calculate the probability for the self transitions/diagonals
        for target in caps:
            if trans_dict[target] == 0: continue
            # Will include the {target}.{target} column, but the value is initially set to zero
            columns = df.filter(regex=fr"^{target}\.").columns.tolist()
            cumulative = df.loc[indx,columns].values.sum()
            df.loc[indx,f"{target}.{target}"] = 1.0 - cumulative

Added new external function - transition_matrix, which generates and visualizes the average transition probabilities
for all groups, using the transition probability dataframe outputted by CAP.calculate_metrics

Assets 4

06 Aug 07:27

donishadsmith

0.16.1

5669abc

0.16.1

♻ Changed

For knn_dict, cKdtree is replaced with Kdtree and scipy is restricted to 1.6.0 or later since that is the version
were Kdtree used the C implementation.
TimeseriesExtractor.get_bold() can now be used on Windows, pybids still does not install by default to prevent
long path error but pip install neurocaps[windows] can be used for installation.
All instances of textwrap replaced with normal strings, printed warnings or messages will be longer in length now
and occupies less vertical screen space.

Assets 4

31 Jul 20:43

donishadsmith

0.16.0

a3b0e5c

0.16.0

♻ Changed

In CAP.caps2surf, the save_stat_map parameter has been changed to save_stat_maps.
Slight improvements in a few errors/exceptions to improve their informativeness.
Now, when a subject's run is excluded due to exceeding the fd threshold, the percentage of their volumes
exceeding the threshold is given as opposed to simply stating that they have been excluded.

🐛 Fixes

Fix a specific instance when tr is not specified for TimseriesExtractor.get_bold. When the tr is not specified,
the code attempts to check the the bold metadata/json file in the derivatives directory to extract the
repetition time. Now, it will check for this file in both the derivatives and root bids dir. The code will also
raise an error earlier if the tr isn't specified, cannot be extracted from the bold metadata file, and bandpass filtering
is requested.
A warning check that is done to assess if indices for a certain condition is outside a possible range due to
duration mismatch, incorrect tr, etc is now also done before calculating the percentage of volumes exceeding the threshold
to not dilute calculations. Before this check was only done before extracting the condition from the timeseries array.

💻 Metadata

Very minor documentation updates for TimseriesExtractor.get_bold.

Assets 4

23 Jul 23:17

donishadsmith

0.15.2

1e011de

0.15.2

[0.15.2] - 2024-07-23

♻ Changed

Created a specific message when dummy_scans = {"auto": True} and zero "non_steady_state_outlier_XX" are found
when verbose=True.
Regardless if parcel_approach, whether used as a setter or input, accepts pickles.

🐛 Fixes

Fixed a reference before assignment issue in merge_dicts. This occurred when only the merged dictionary was requested
to be saved without saving the reduced dictionaries, and no user-provided file_names were given. In this scenario,
the default name for the merged dictionary is now correctly used.

[0.15.1] - 2024-07-23

🚀 New/Added

In TimeseriesExtractor, "min" and "max" sub-keys can now be used when dummy_scans is a dictionary and the
"auto" sub-key is True. The "min" sub-key is used to set the minimum dummy scans to remove if the number of
"non_steady_state_outlier_XX" columns detected is less than this value and the "max" sub-key is used to set the
maximum number of dummy scans to remove if the number of "non_steady_state_outlier_XX" columns detected exceeds this
value.

[0.15.0] - 2024-07-21

🚀 New/Added

save_reduced_dicts parameter to merge_dicts so that the reduced dictionaries can also be saved instead of only
being returned.

♻ Changed

Some parameter names, inputs, and outputs for non-class functions - merge_dicts, change_dtypes, and standardize
have changed to improve consistency across these functions.
- merge_dicts
  - return_combined_dict has been changed to return_merged_dict.
  - file_name has been changed to file_names since the reduced dicts can also be saved now.
  - Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
- standardize & change_dtypes
  - subject_timeseries has been changed to subject_timeseries_list, the same as in merge_dicts.
  - file_name has been changed to file_names.
  - return_dict has been changed to return_dicts.
The returned dictionary for merge_dicts, change_dtypes, and standardize is only
dict[str, dict[str, dict[str, np.ndarray]]] now.

In CAP.calculate_metrics, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
import or use numpy operations to reduce needed to create the same calculation.

"counts"

Previous Code:

# Get frequency
frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run]))
# Sort the keys
sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))}
# Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP
if len(sorted_frequency_dict) != len(cap_numbers):
    sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in
                             list(sorted_frequency_dict) else 0 for cap_number in cap_numbers}
# Replace zeros with nan for groups with less caps than the group with the max caps
if len(cap_numbers) > group_cap_counts[group]:
    sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if
                             cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                             cap_numbers}

Refactored Code:

# Get frequency;
frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum()
                  for key in range(1, group_cap_counts[group] + 1)}
# Replace zeros with nan for groups with less caps than the group with the max caps
if max(cap_numbers) > group_cap_counts[group]:
    for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})

"temporal_fraction"

Previous Code:

proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run]))
                               for key, item in sorted_frequency_dict.items()}

"Refactored Code": Nothing other than some parameter names have changed.

proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run]))
                   for key, value in frequency_dict.items()}

"persistence"

Previous Code:

# Initialize variable
persistence_dict = {}
uninterrupted_volumes = []
count = 0
# Iterate through caps
for target in cap_numbers:
    # Iterate through each element and count uninterrupted volumes that equal target
    for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
        if predicted_subject_timeseries[subj_id][curr_run][index] == target:
            count +=1
        # Store count in list if interrupted and not zero
        else:
            if count != 0:
                uninterrupted_volumes.append(count)
            # Reset counter
            count = 0
    # In the event, a participant only occupies one CAP and to ensure final counts are added
    if count > 0:
        uninterrupted_volumes.append(count)
    # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide
    if len(uninterrupted_volumes) > 0:
        persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes)
        if tr:
            persistence_dict.update({target: persistence_value*tr})
        else:
            persistence_dict.update({target: persistence_value})
    else:
        # Zero indicates that a participant has zero instances of the CAP
        persistence_dict.update({target: 0})
    # Reset variables
    count = 0
    uninterrupted_volumes = []

# Replace zeros with nan for groups with less caps than the group with the max caps
if len(cap_numbers) > group_cap_counts[group]:
    persistence_dict = {cap_number: persistence_dict[cap_number] if
                        cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                        cap_numbers}

Refactored Code:

# Initialize variable
persistence_dict = {}
# Iterate through caps
for target in cap_numbers:
    # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0]
    binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0)
    # Get indices of values that equal 1; [0,2,3,4]
    target_indices = np.where(binary_arr == 1)[0]
    # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1];
    # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2
    segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1
    # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is
    # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met
    persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)})

# Replace zeros with nan for groups with less caps than the group with the max caps
if max(cap_numbers) > group_cap_counts[group]:
    for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})

"transition_frequency"

Previous Code:

count = 0
# Iterate through predicted values
for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
    if index != 0:
        # If the subsequent element does not equal the previous element, this is considered a transition
        if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]:
            count +=1
# Populate DataFrame
new_row = [subj_id, group_name, curr_run, count]
df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row

Refactored Code:

# Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation
# for values not zero is [1,1,0,0,1] = 3 transitions
transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()

Note, the n parameter in np.diff defaults to 1, and differences are calculated as out[i] = a[i+1] - a[i]

Assets 4

23 Jul 19:37

donishadsmith

0.15.1

134c4ff

0.15.1

[0.15.1] - 2024-07-23

🚀 New/Added

In TimeseriesExtractor, "min" and "max" sub-keys can now be used when dummy_scans is a dictionary and the
"auto" sub-key is True. The "min" sub-key is used to set the minimum dummy scans to remove if the number of
"non_steady_state_outlier_XX" columns detected is less than this value and the "max" sub-key is used to set the
maximum number of dummy scans to remove if the number of "non_steady_state_outlier_XX" columns detected exceeds this
value.

[0.15.0] - 2024-07-21

🚀 New/Added

save_reduced_dicts parameter to merge_dicts so that the reduced dictionaries can also be saved instead of only
being returned.

♻ Changed

Some parameter names, inputs, and outputs for non-class functions - merge_dicts, change_dtypes, and standardize
have changed to improve consistency across these functions.
- merge_dicts
  - return_combined_dict has been changed to return_merged_dict.
  - file_name has been changed to file_names since the reduced dicts can also be saved now.
  - Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
- standardize & change_dtypes
  - subject_timeseries has been changed to subject_timeseries_list, the same as in merge_dicts.
  - file_name has been changed to file_names.
  - return_dict has been changed to return_dicts.
The returned dictionary for merge_dicts, change_dtypes, and standardize is only
dict[str, dict[str, dict[str, np.ndarray]]] now.

In CAP.calculate_metrics, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
import or use numpy operations to reduce needed to create the same calculation.

"counts"

Previous Code:

# Get frequency
frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run]))
# Sort the keys
sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))}
# Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP
if len(sorted_frequency_dict) != len(cap_numbers):
    sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in
                             list(sorted_frequency_dict) else 0 for cap_number in cap_numbers}
# Replace zeros with nan for groups with less caps than the group with the max caps
if len(cap_numbers) > group_cap_counts[group]:
    sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if
                             cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                             cap_numbers}

Refactored Code:

# Get frequency;
frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum()
                  for key in range(1, group_cap_counts[group] + 1)}
# Replace zeros with nan for groups with less caps than the group with the max caps
if max(cap_numbers) > group_cap_counts[group]:
    for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})

"temporal_fraction"

Previous Code:

proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run]))
                               for key, item in sorted_frequency_dict.items()}

"Refactored Code": Nothing other than some parameter names have changed.

proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run]))
                   for key, value in frequency_dict.items()}

"persistence"

Previous Code:

# Initialize variable
persistence_dict = {}
uninterrupted_volumes = []
count = 0
# Iterate through caps
for target in cap_numbers:
    # Iterate through each element and count uninterrupted volumes that equal target
    for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
        if predicted_subject_timeseries[subj_id][curr_run][index] == target:
            count +=1
        # Store count in list if interrupted and not zero
        else:
            if count != 0:
                uninterrupted_volumes.append(count)
            # Reset counter
            count = 0
    # In the event, a participant only occupies one CAP and to ensure final counts are added
    if count > 0:
        uninterrupted_volumes.append(count)
    # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide
    if len(uninterrupted_volumes) > 0:
        persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes)
        if tr:
            persistence_dict.update({target: persistence_value*tr})
        else:
            persistence_dict.update({target: persistence_value})
    else:
        # Zero indicates that a participant has zero instances of the CAP
        persistence_dict.update({target: 0})
    # Reset variables
    count = 0
    uninterrupted_volumes = []

# Replace zeros with nan for groups with less caps than the group with the max caps
if len(cap_numbers) > group_cap_counts[group]:
    persistence_dict = {cap_number: persistence_dict[cap_number] if
                        cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                        cap_numbers}

Refactored Code:

# Initialize variable
persistence_dict = {}
# Iterate through caps
for target in cap_numbers:
    # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0]
    binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0)
    # Get indices of values that equal 1; [0,2,3,4]
    target_indices = np.where(binary_arr == 1)[0]
    # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1];
    # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2
    segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1
    # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is
    # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met
    persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)})

# Replace zeros with nan for groups with less caps than the group with the max caps
if max(cap_numbers) > group_cap_counts[group]:
    for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})

"transition_frequency"

Previous Code:

count = 0
# Iterate through predicted values
for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
    if index != 0:
        # If the subsequent element does not equal the previous element, this is considered a transition
        if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]:
            count +=1
# Populate DataFrame
new_row = [subj_id, group_name, curr_run, count]
df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row

Refactored Code:

# Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation
# for values not zero is [1,1,0,0,1] = 3 transitions
transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()

Note, the n parameter in np.diff defaults to 1, and differences are calculated as out[i] = a[i+1] - a[i]

🐛 Fixes

When a pickle file was used as input in standardize or change_dtype an error was produced, this has been fixed
and these functions accept a list of dictionaries or a list of pickle files now.

💻 Metadata

In the documentation for CAP.caps2corr it is now explicitly stated that the type of correlation being used is
Pearson correlation.

Assets 4

22 Jul 00:53

donishadsmith

0.15.0

de85f06

0.15.0

🚀 New/Added

save_reduced_dicts parameter to merge_dicts so that the reduced dictionaries can also be saved instead of only
being returned.

♻ Changed

Some parameter names, inputs, and outputs for non-class functions - merge_dicts, change_dtypes, and standardize
have changed to improve consistency across these functions.
- merge_dicts
  - return_combined_dict has been changed to return_merged_dict.
  - file_name has been changed to file_names since the reduced dicts can also be saved now.
  - Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
- standardize & change_dtypes
  - subject_timeseries has been changed to subject_timeseries_list, the same as in merge_dicts.
  - file_name has been changed to file_names.
  - return_dict has been changed to return_dicts.
The returned dictionary for merge_dicts, change_dtypes, and standardize is only
dict[str, dict[str, dict[str, np.ndarray]]] now.

In CAP.calculate_metrics, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
import or use numpy operations to reduce needed to create the same calculation.

"counts"

Previous Code:

# Get frequency
frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run]))
# Sort the keys
sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))}
# Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP
if len(sorted_frequency_dict) != len(cap_numbers):
    sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in
                             list(sorted_frequency_dict) else 0 for cap_number in cap_numbers}
# Replace zeros with nan for groups with less caps than the group with the max caps
if len(cap_numbers) > group_cap_counts[group]:
    sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if
                             cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                             cap_numbers}

Refactored Code:

# Get frequency;
frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum()
                  for key in range(1, group_cap_counts[group] + 1)}
# Replace zeros with nan for groups with less caps than the group with the max caps
if max(cap_numbers) > group_cap_counts[group]:
    for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})

"temporal_fraction"

Previous Code:

proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run]))
                               for key, item in sorted_frequency_dict.items()}

"Refactored Code": Nothing other than some parameter names have changed.

proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run]))
                   for key, value in frequency_dict.items()}

"persistence"

Previous Code:

# Initialize variable
persistence_dict = {}
uninterrupted_volumes = []
count = 0
# Iterate through caps
for target in cap_numbers:
    # Iterate through each element and count uninterrupted volumes that equal target
    for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
        if predicted_subject_timeseries[subj_id][curr_run][index] == target:
            count +=1
        # Store count in list if interrupted and not zero
        else:
            if count != 0:
                uninterrupted_volumes.append(count)
            # Reset counter
            count = 0
    # In the event, a participant only occupies one CAP and to ensure final counts are added
    if count > 0:
        uninterrupted_volumes.append(count)
    # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide
    if len(uninterrupted_volumes) > 0:
        persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes)
        if tr:
            persistence_dict.update({target: persistence_value*tr})
        else:
            persistence_dict.update({target: persistence_value})
    else:
        # Zero indicates that a participant has zero instances of the CAP
        persistence_dict.update({target: 0})
    # Reset variables
    count = 0
    uninterrupted_volumes = []

# Replace zeros with nan for groups with less caps than the group with the max caps
if len(cap_numbers) > group_cap_counts[group]:
    persistence_dict = {cap_number: persistence_dict[cap_number] if
                        cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                        cap_numbers}

Refactored Code:

# Initialize variable
persistence_dict = {}
# Iterate through caps
for target in cap_numbers:
    # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0]
    binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0)
    # Get indices of values that equal 1; [0,2,3,4]
    target_indices = np.where(binary_arr == 1)[0]
    # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1];
    # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2
    segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1
    # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is
    # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met
    persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)})

# Replace zeros with nan for groups with less caps than the group with the max caps
if max(cap_numbers) > group_cap_counts[group]:
    for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})

"transition_frequency"

Previous Code:

count = 0
# Iterate through predicted values
for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
    if index != 0:
        # If the subsequent element does not equal the previous element, this is considered a transition
        if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]:
            count +=1
# Populate DataFrame
new_row = [subj_id, group_name, curr_run, count]
df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row

Refactored Code:

# Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation
# for values not zero is [1,1,0,0,1] = 3 transitions
transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()

🐛 Fixes

When a pickle file was used as input in standardize or change_dtype an error was produced, this has been fixed
and these functions accept a list of dictionaries or a list of pickle files now.

💻 Metadata

In the documentation for CAP.caps2corr it is now explicitly stated that the type of correlation being used is
Pearson correlation.

Assets 4

17 Jul 18:27

donishadsmith

0.14.7

cc45cd5

0.14.7

[0.14.7] - 2024-07-17

♻ Changed

Improved Warning Messages and Print Statements:
- In TimeseriesExtractor.get_bold, the subject-specific information output has been reformatted for better readability:
  - Previous Format:
```
Subject: 1; run:1 - Message
```
  - New Format:
```
[SUBJECT: 1 | SESSION: 1 | TASK: rest | RUN: 1]
-----------------------------------------------
Message
```
- In CAP class numerous warnings and statements have been changed to improve clarity:
  - Previous Format:
```
Optimal cluster size using silhouette method for A is 2.
```
  - New Format:
```
[GROUP: A | METHOD: silhouette] - Optimal cluster size is 2.
```
- These changes should improve clarity when viewing in a terminal or when redirected to an output file by SLURM.
- Language in many statements and warnings have also been improved.

[0.14.6] - 2024-07-16

🐛 Fixes

For CAP.get_caps, when cluster_selection_method was used to find the optimal cluster size, the model would be
re-estimated and stored in the self.kmeans property for later use. Previously, the internal function that generated the
model using scikit's KMeans only returned the performance metrics. These metrics for each cluster size were assessed,
and the best cluster size was used to generate the optimal KMeans model with the same parameters. This is fine when
setting random_seed with the same k since the model would produce the same initial cluster centroids and produces similar
clustering solution regardless of the number of times the model is re-generated. However, if a random seed was not used,
the newly re-generated optimal model would technically differ despite having the same k, due to the random nature of KMeans
when initializing the cluster centroids. Now, the internal function returns both the performance metrics and the models,
ensuring the exact same model that was assessed is stored in the self.kmeans. Shouldn't be an incredibly major issue
if your models are generally stable and produce similar cluster solutions. Though when not using a random seed, even
minor differences in the kmeans model even when using the same k can produce some statistical differences. Ultimately,
it is always best to ensure that the same model that the same model used for assessment and for later analyses are the
same to ensure robust results.

Assets 4

16 Jul 19:00

donishadsmith

0.14.6

989646d

0.14.6

🐛 Fixes

For CAP.get_caps, when cluster_selection_method was used to find the optimal cluster size, the model would be
re-estimated and stored in the self.kmeans property for later use. Previously, the internal function that generated the
model using scikit's KMeans only returned the performance metrics. These metrics for each cluster size were assessed,
and the best cluster size was used to generate the optimal KMeans model with the same parameters. This is fine when
setting random_seed with the same k since the model would produce the same initial cluster centroids and produces similar
clustering solution regardless of the number of times the model is re-generated. However, if a random seed was not used,
the newly re-generated optimal model would technically differ despite having the same k, due to the random nature of KMeans
when initializing the cluster centroids. Now, the internal function returns both the performance metrics and the models,
ensuring the exact same model that was assessed is stored in the self.kmeans. Shouldn't be an incredibly major issue
if your models are generally stable and produce similar cluster solutions. Though when not using a random seed, even
minor differences in the kmeans model even when using the same k can produce some statistical differences. Ultimately,
it is always best to ensure that the same model that the same model used for assessment and for later analyses are the
same to ensure robust results.

Assets 4

16 Jul 04:23

donishadsmith

0.14.5

4e955a5

0.14.5

♻ Changed

In TimeseriesExtractor, dummy_scans can now be a dictionary that uses the "auto" sub-key If "auto" is set to
True, the number of dummy scans removed depend on the number of "non_steady_state_outlier_XX" columns in the
participants fMRIPrep confounds tsv file. For instance, if there are two "non_steady_state_outlier_XX" columns
detected, then dummy_scans is set to two since there is one "non_steady_state_outlier_XX" per outlier volume for
fMRIPrep. This is assessed for each run of all participants so dummy_scans depends on the number number of
"non_steady_state_outlier_XX" in the confound file associated with the specific participant, task, and run number.

🐛 Fixes

For defensive programming purposes, instead of assuming the timing information in the event file perfectly
coincides with the timeseries. When a condition is specified and onset and duration must be used to extract the
indices corresponding to the condition of interest, the max scan index is checked to see if it exceeds the length of
the timeseries. If this condition is met, a warning is issued in the event of timing misalignment (i.e errors in event
file, incorrect repetition time, etc) and invalid indices are ignored to only extract the valid indices from the timeseries.
This is done in the event this was that are greater than the timeseries shape are ignored.

Assets 4

15 Jul 04:49

donishadsmith

0.14.4

0bd9fa6

0.14.4

[0.14.4] - 2024-07-15

♻ Changed

Minor update that prints the optimal cluster size for each group when using cluster_selection_method in
CAP.get_caps(). Just for information purposes.
Previously version 0.14.3.post1

[0.14.3.post1] - YANKED

♻ Changed

Minor update that prints the optimal cluster size for each group when using cluster_selection_method in
CAP.get_caps(). Just for information purposes.
Yanked due to not being a metadata update, this should be a patch update to denote a behavioral change,
this is now version 0.14.4 to adhere a bit better to versioning practices.

[0.14.3] - 2024-07-14

Thought of some minor changes.

♻ Changed

Added new warning if fd_threshold is specified but use_confounds is False since fd_threshold needs the confound
file from fMRIPrep. In previous version, censoring just didn't occur and never issued a warning.
Changed the error exception types for cosine similarity in CAP.caps2radar from ValueError to ZeroDivisionError
Added ValueError in TimeseriesExtractor.visualize_bold if both region and roi_indx is None.
In TimeseriesExtractor.visualize_bold if roi_indx is a string, int, or list with a single element, a title is
added to the plot.

[0.14.2.post2] - 2024-07-14

💻 Metadata

Simply wanted the latest metadata update to be on Zenodo and to have the same DOI as I forgot to upload
version 0.14.2.post1 there.

[0.14.2.post1] - 2024-07-14

💻 Metadata

Updated a warning during timeseries extraction that only included a partial reason for why the indices for condition
have been filtered out. Added information about fd_threshold being the reason why.

[0.14.2] - 2024-07-14

♻ Changed

Implemented a minor code refactoring that allows runs flagged due to "outlier_percentage", runs were all volumes will
be scrubbed due to all volumes exceeding the threshold for framewise displacement, and runs were the specified condition
returns zero indices will not undergo timeseries extraction.
Also clarified the language in a warning that occurs when all NifTI files have been excluded or missing for a subject.

🐛 Fixes

If a condition does not exist in the event file, a warning will be issued if this occurs. This should prevent empty
timeseries or errors. In the warning the condition will be named in the event of a spelling error.
Added specific error type to except blocks for the cosine similarities that cause a division by zero error.

Assets 4

Releases: donishadsmith/neurocaps

0.16.2

[0.16.2] - 2024-08-22

0.16.1

♻ Changed

0.16.0

♻ Changed

🐛 Fixes

💻 Metadata

0.15.2

[0.15.2] - 2024-07-23

♻ Changed

🐛 Fixes

[0.15.1] - 2024-07-23

🚀 New/Added

[0.15.0] - 2024-07-21

🚀 New/Added

♻ Changed

0.15.1

[0.15.1] - 2024-07-23

🚀 New/Added

[0.15.0] - 2024-07-21

🚀 New/Added

♻ Changed

🐛 Fixes

💻 Metadata

0.15.0

🚀 New/Added

♻ Changed

🐛 Fixes

💻 Metadata

0.14.7

[0.14.7] - 2024-07-17

♻ Changed

[0.14.6] - 2024-07-16

🐛 Fixes

0.14.6

🐛 Fixes

0.14.5

♻ Changed

🐛 Fixes

0.14.4

[0.14.4] - 2024-07-15

♻ Changed

[0.14.3.post1] - YANKED

♻ Changed

[0.14.3] - 2024-07-14

♻ Changed

[0.14.2.post2] - 2024-07-14

💻 Metadata

[0.14.2.post1] - 2024-07-14

💻 Metadata

[0.14.2] - 2024-07-14

♻ Changed

🐛 Fixes