Skip to content

Releases: donishadsmith/neurocaps

0.16.2

23 Aug 00:53
Compare
Choose a tag to compare

[0.16.2] - 2024-08-22

  • Transition probabilities has been added to CAP.calculate_metrics. Below is a snippet from the codebase
    of how the calculation is done.
    if "transition_probability" in metrics:
        temp_dict[group].loc[len(temp_dict[group])] = [subj_id, group, curr_run] + [0.0]*(temp_dict[group].shape[-1]-3)
        # Get number of transitions
        trans_dict = {target: np.sum(np.where(predicted_subject_timeseries[subj_id][curr_run][:-1] == target, 1, 0))
                        for target in group_caps[group]}
        indx = temp_dict[group].index[-1]
        # Iterate through products and calculate all symmetric pairs/off-diagonals
        for prod in products_unique[group]:
            target1, target2 = prod[0], prod[1]
            trans_array = predicted_subject_timeseries[subj_id][curr_run].copy()
            # Set all values not equal to target1 or target2 to zero
            trans_array[(trans_array != target1) & (trans_array != target2)] = 0
            trans_array[np.where(trans_array == target1)] = 1
            trans_array[np.where(trans_array == target2)] = 3
            # 2 indicates forward transition target1 -> target2; -2 means reverse/backward transition target2 -> target1
            diff_array = np.diff(trans_array,n=1)
            # Avoid division by zero errors and calculate both the forward and reverse transition
            if trans_dict[target1] != 0:
                temp_dict[group].loc[indx,f"{target1}.{target2}"] = float(np.sum(np.where(diff_array==2,1,0))/trans_dict[target1])
            if trans_dict[target2] != 0:
                temp_dict[group].loc[indx,f"{target2}.{target1}"] = float(np.sum(np.where(diff_array==-2,1,0))/trans_dict[target2])

        # Calculate the probability for the self transitions/diagonals
        for target in group_caps[group]:
            if trans_dict[target] == 0: continue
            # Will include the {target}.{target} column, but the value is initially set to zero
            columns = temp_dict[group].filter(regex=fr"^{target}\.").columns.tolist()
            cumulative = temp_dict[group].loc[indx,columns].values.sum()
            temp_dict[group].loc[indx,f"{target}.{target}"] = 1.0 - cumulative

Below is a simplified version of the above snippet.

    import itertools, math, pandas as pd, numpy as np
    groups = [["101","A","1"], ["102","B","1"]]
    timeseries_dict = {
        "101": np.array([1,1,1,1,2,2,1,4,3,5,3,3,5,5,6,7]),
        "102": np.array([1,2,1,1,3,3,1,4,3,5,3,3,4,5,6,8,7])
    }
    caps = list(range(1,9))
    # Get all combinations of transitions
    products = list(itertools.product(caps,caps))
    df = pd.DataFrame(columns=["Subject_ID", "Group","Run"]+[f"{x}.{y}" for x,y in products])
    # Filter out all reversed products and products with the self transitions
    products_unique = []
    for prod in products:
        if prod[0] == prod[1]: continue
        # Include only the first instance of symmetric pairs
        if (prod[1],prod[0]) not in products_unique: products_unique.append(prod)

    for info in groups:
        df.loc[len(df)] = info + [0.0]*(df.shape[-1]-3)
        timeseries = timeseries_dict[info[0]]
        # Get number of transitions
        trans_dict = {target: np.sum(np.where(timeseries[:-1] == target, 1, 0)) for target in caps}
        indx = df.index[-1]
        # Iterate through products and calculate all symmetric pairs/off-diagonals
        for prod in products_unique:
            target1, target2 = prod[0], prod[1]
            trans_array = timeseries.copy()
            # Set all values not equal to target1 or target2 to zero
            trans_array[(trans_array != target1) & (trans_array != target2)] = 0
            trans_array[np.where(trans_array == target1)] = 1
            trans_array[np.where(trans_array == target2)] = 3
            # 2 indicates forward transition target1 -> target2; -2 means reverse/backward transition target2 -> target1
            diff_array = np.diff(trans_array,n=1)
            # Avoid division by zero errors and calculate both the forward and reverse transition
            if trans_dict[target1] != 0:
                df.loc[indx,f"{target1}.{target2}"] = float(np.sum(np.where(diff_array==2,1,0))/trans_dict[target1])
            if trans_dict[target2] != 0:
                df.loc[indx,f"{target2}.{target1}"] = float(np.sum(np.where(diff_array==-2,1,0))/trans_dict[target2])
        
        # Calculate the probability for the self transitions/diagonals
        for target in caps:
            if trans_dict[target] == 0: continue
            # Will include the {target}.{target} column, but the value is initially set to zero
            columns = df.filter(regex=fr"^{target}\.").columns.tolist()
            cumulative = df.loc[indx,columns].values.sum()
            df.loc[indx,f"{target}.{target}"] = 1.0 - cumulative
  • Added new external function - transition_matrix, which generates and visualizes the average transition probabilities
    for all groups, using the transition probability dataframe outputted by CAP.calculate_metrics

0.16.1

06 Aug 07:27
Compare
Choose a tag to compare

♻ Changed

  • For knn_dict, cKdtree is replaced with Kdtree and scipy is restricted to 1.6.0 or later since that is the version
    were Kdtree used the C implementation.
  • TimeseriesExtractor.get_bold() can now be used on Windows, pybids still does not install by default to prevent
    long path error but pip install neurocaps[windows] can be used for installation.
  • All instances of textwrap replaced with normal strings, printed warnings or messages will be longer in length now
    and occupies less vertical screen space.

0.16.0

31 Jul 20:43
Compare
Choose a tag to compare

♻ Changed

  • In CAP.caps2surf, the save_stat_map parameter has been changed to save_stat_maps.
  • Slight improvements in a few errors/exceptions to improve their informativeness.
  • Now, when a subject's run is excluded due to exceeding the fd threshold, the percentage of their volumes
    exceeding the threshold is given as opposed to simply stating that they have been excluded.

🐛 Fixes

  • Fix a specific instance when tr is not specified for TimseriesExtractor.get_bold. When the tr is not specified,
    the code attempts to check the the bold metadata/json file in the derivatives directory to extract the
    repetition time. Now, it will check for this file in both the derivatives and root bids dir. The code will also
    raise an error earlier if the tr isn't specified, cannot be extracted from the bold metadata file, and bandpass filtering
    is requested.
  • A warning check that is done to assess if indices for a certain condition is outside a possible range due to
    duration mismatch, incorrect tr, etc is now also done before calculating the percentage of volumes exceeding the threshold
    to not dilute calculations. Before this check was only done before extracting the condition from the timeseries array.

💻 Metadata

  • Very minor documentation updates for TimseriesExtractor.get_bold.

0.15.2

23 Jul 23:17
Compare
Choose a tag to compare

[0.15.2] - 2024-07-23

♻ Changed

  • Created a specific message when dummy_scans = {"auto": True} and zero "non_steady_state_outlier_XX" are found
    when verbose=True.
  • Regardless if parcel_approach, whether used as a setter or input, accepts pickles.

🐛 Fixes

Fixed a reference before assignment issue in merge_dicts. This occurred when only the merged dictionary was requested
to be saved without saving the reduced dictionaries, and no user-provided file_names were given. In this scenario,
the default name for the merged dictionary is now correctly used.

[0.15.1] - 2024-07-23

🚀 New/Added

  • In TimeseriesExtractor, "min" and "max" sub-keys can now be used when dummy_scans is a dictionary and the
    "auto" sub-key is True. The "min" sub-key is used to set the minimum dummy scans to remove if the number of
    "non_steady_state_outlier_XX" columns detected is less than this value and the "max" sub-key is used to set the
    maximum number of dummy scans to remove if the number of "non_steady_state_outlier_XX" columns detected exceeds this
    value.

[0.15.0] - 2024-07-21

🚀 New/Added

  • save_reduced_dicts parameter to merge_dicts so that the reduced dictionaries can also be saved instead of only
    being returned.

♻ Changed

  • Some parameter names, inputs, and outputs for non-class functions - merge_dicts, change_dtypes, and standardize
    have changed to improve consistency across these functions.

    • merge_dicts
      • return_combined_dict has been changed to return_merged_dict.
      • file_name has been changed to file_names since the reduced dicts can also be saved now.
      • Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
    • standardize & change_dtypes
      • subject_timeseries has been changed to subject_timeseries_list, the same as in merge_dicts.
      • file_name has been changed to file_names.
      • return_dict has been changed to return_dicts.
  • The returned dictionary for merge_dicts, change_dtypes, and standardize is only
    dict[str, dict[str, dict[str, np.ndarray]]] now.

  • In CAP.calculate_metrics, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
    import or use numpy operations to reduce needed to create the same calculation.

    • "counts"

      • Previous Code:
      # Get frequency
      frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run]))
      # Sort the keys
      sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))}
      # Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP
      if len(sorted_frequency_dict) != len(cap_numbers):
          sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in
                                   list(sorted_frequency_dict) else 0 for cap_number in cap_numbers}
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if len(cap_numbers) > group_cap_counts[group]:
          sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if
                                   cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                                   cap_numbers}
      • Refactored Code:
      # Get frequency;
      frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum()
                        for key in range(1, group_cap_counts[group] + 1)}
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if max(cap_numbers) > group_cap_counts[group]:
          for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})
    • "temporal_fraction"

      • Previous Code:
      proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run]))
                                     for key, item in sorted_frequency_dict.items()}
      • "Refactored Code": Nothing other than some parameter names have changed.
      proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run]))
                         for key, value in frequency_dict.items()}
    • "persistence"

      • Previous Code:
      # Initialize variable
      persistence_dict = {}
      uninterrupted_volumes = []
      count = 0
      # Iterate through caps
      for target in cap_numbers:
          # Iterate through each element and count uninterrupted volumes that equal target
          for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
              if predicted_subject_timeseries[subj_id][curr_run][index] == target:
                  count +=1
              # Store count in list if interrupted and not zero
              else:
                  if count != 0:
                      uninterrupted_volumes.append(count)
                  # Reset counter
                  count = 0
          # In the event, a participant only occupies one CAP and to ensure final counts are added
          if count > 0:
              uninterrupted_volumes.append(count)
          # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide
          if len(uninterrupted_volumes) > 0:
              persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes)
              if tr:
                  persistence_dict.update({target: persistence_value*tr})
              else:
                  persistence_dict.update({target: persistence_value})
          else:
              # Zero indicates that a participant has zero instances of the CAP
              persistence_dict.update({target: 0})
          # Reset variables
          count = 0
          uninterrupted_volumes = []
      
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if len(cap_numbers) > group_cap_counts[group]:
          persistence_dict = {cap_number: persistence_dict[cap_number] if
                              cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                              cap_numbers}
      • Refactored Code:
      # Initialize variable
      persistence_dict = {}
      # Iterate through caps
      for target in cap_numbers:
          # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0]
          binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0)
          # Get indices of values that equal 1; [0,2,3,4]
          target_indices = np.where(binary_arr == 1)[0]
          # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1];
          # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2
          segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1
          # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is
          # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met
          persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)})
      
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if max(cap_numbers) > group_cap_counts[group]:
          for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})
    • "transition_frequency"

      • Previous Code:
      count = 0
      # Iterate through predicted values
      for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
          if index != 0:
              # If the subsequent element does not equal the previous element, this is considered a transition
              if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]:
                  count +=1
      # Populate DataFrame
      new_row = [subj_id, group_name, curr_run, count]
      df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row
      • Refactored Code:
      # Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation
      # for values not zero is [1,1,0,0,1] = 3 transitions
      transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()

      Note, the n parameter in np.diff defaults to 1, and differences are calculated as out[i] = a[i+1] - a[i]

0.15.1

23 Jul 19:37
Compare
Choose a tag to compare

[0.15.1] - 2024-07-23

🚀 New/Added

  • In TimeseriesExtractor, "min" and "max" sub-keys can now be used when dummy_scans is a dictionary and the
    "auto" sub-key is True. The "min" sub-key is used to set the minimum dummy scans to remove if the number of
    "non_steady_state_outlier_XX" columns detected is less than this value and the "max" sub-key is used to set the
    maximum number of dummy scans to remove if the number of "non_steady_state_outlier_XX" columns detected exceeds this
    value.

[0.15.0] - 2024-07-21

🚀 New/Added

  • save_reduced_dicts parameter to merge_dicts so that the reduced dictionaries can also be saved instead of only
    being returned.

♻ Changed

  • Some parameter names, inputs, and outputs for non-class functions - merge_dicts, change_dtypes, and standardize
    have changed to improve consistency across these functions.

    • merge_dicts
      • return_combined_dict has been changed to return_merged_dict.
      • file_name has been changed to file_names since the reduced dicts can also be saved now.
      • Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
    • standardize & change_dtypes
      • subject_timeseries has been changed to subject_timeseries_list, the same as in merge_dicts.
      • file_name has been changed to file_names.
      • return_dict has been changed to return_dicts.
  • The returned dictionary for merge_dicts, change_dtypes, and standardize is only
    dict[str, dict[str, dict[str, np.ndarray]]] now.

  • In CAP.calculate_metrics, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
    import or use numpy operations to reduce needed to create the same calculation.

    • "counts"

      • Previous Code:
      # Get frequency
      frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run]))
      # Sort the keys
      sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))}
      # Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP
      if len(sorted_frequency_dict) != len(cap_numbers):
          sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in
                                   list(sorted_frequency_dict) else 0 for cap_number in cap_numbers}
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if len(cap_numbers) > group_cap_counts[group]:
          sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if
                                   cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                                   cap_numbers}
      • Refactored Code:
      # Get frequency;
      frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum()
                        for key in range(1, group_cap_counts[group] + 1)}
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if max(cap_numbers) > group_cap_counts[group]:
          for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})
    • "temporal_fraction"

      • Previous Code:
      proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run]))
                                     for key, item in sorted_frequency_dict.items()}
      • "Refactored Code": Nothing other than some parameter names have changed.
      proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run]))
                         for key, value in frequency_dict.items()}
    • "persistence"

      • Previous Code:
      # Initialize variable
      persistence_dict = {}
      uninterrupted_volumes = []
      count = 0
      # Iterate through caps
      for target in cap_numbers:
          # Iterate through each element and count uninterrupted volumes that equal target
          for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
              if predicted_subject_timeseries[subj_id][curr_run][index] == target:
                  count +=1
              # Store count in list if interrupted and not zero
              else:
                  if count != 0:
                      uninterrupted_volumes.append(count)
                  # Reset counter
                  count = 0
          # In the event, a participant only occupies one CAP and to ensure final counts are added
          if count > 0:
              uninterrupted_volumes.append(count)
          # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide
          if len(uninterrupted_volumes) > 0:
              persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes)
              if tr:
                  persistence_dict.update({target: persistence_value*tr})
              else:
                  persistence_dict.update({target: persistence_value})
          else:
              # Zero indicates that a participant has zero instances of the CAP
              persistence_dict.update({target: 0})
          # Reset variables
          count = 0
          uninterrupted_volumes = []
      
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if len(cap_numbers) > group_cap_counts[group]:
          persistence_dict = {cap_number: persistence_dict[cap_number] if
                              cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                              cap_numbers}
      • Refactored Code:
      # Initialize variable
      persistence_dict = {}
      # Iterate through caps
      for target in cap_numbers:
          # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0]
          binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0)
          # Get indices of values that equal 1; [0,2,3,4]
          target_indices = np.where(binary_arr == 1)[0]
          # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1];
          # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2
          segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1
          # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is
          # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met
          persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)})
      
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if max(cap_numbers) > group_cap_counts[group]:
          for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})
    • "transition_frequency"

      • Previous Code:
      count = 0
      # Iterate through predicted values
      for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
          if index != 0:
              # If the subsequent element does not equal the previous element, this is considered a transition
              if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]:
                  count +=1
      # Populate DataFrame
      new_row = [subj_id, group_name, curr_run, count]
      df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row
      • Refactored Code:
      # Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation
      # for values not zero is [1,1,0,0,1] = 3 transitions
      transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()

      Note, the n parameter in np.diff defaults to 1, and differences are calculated as out[i] = a[i+1] - a[i]

🐛 Fixes

  • When a pickle file was used as input in standardize or change_dtype an error was produced, this has been fixed
    and these functions accept a list of dictionaries or a list of pickle files now.

💻 Metadata

  • In the documentation for CAP.caps2corr it is now explicitly stated that the type of correlation being used is
    Pearson correlation.

0.15.0

22 Jul 00:53
Compare
Choose a tag to compare

🚀 New/Added

  • save_reduced_dicts parameter to merge_dicts so that the reduced dictionaries can also be saved instead of only
    being returned.

♻ Changed

  • Some parameter names, inputs, and outputs for non-class functions - merge_dicts, change_dtypes, and standardize
    have changed to improve consistency across these functions.

    • merge_dicts
      • return_combined_dict has been changed to return_merged_dict.
      • file_name has been changed to file_names since the reduced dicts can also be saved now.
      • Key in output dictionary containing the merged dictionary changed from "combined" to "merged".
    • standardize & change_dtypes
      • subject_timeseries has been changed to subject_timeseries_list, the same as in merge_dicts.
      • file_name has been changed to file_names.
      • return_dict has been changed to return_dicts.
  • The returned dictionary for merge_dicts, change_dtypes, and standardize is only
    dict[str, dict[str, dict[str, np.ndarray]]] now.

  • In CAP.calculate_metrics, the metrics calculations, except for "temporal_fraction" have been refactored to remove an
    import or use numpy operations to reduce needed to create the same calculation.

    • "counts"

      • Previous Code:
      # Get frequency
      frequency_dict = dict(collections.Counter(predicted_subject_timeseries[subj_id][curr_run]))
      # Sort the keys
      sorted_frequency_dict = {key: frequency_dict[key] for key in sorted(list(frequency_dict))}
      # Add zero to missing CAPs for participants that exhibit zero instances of a certain CAP
      if len(sorted_frequency_dict) != len(cap_numbers):
          sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if cap_number in
                                   list(sorted_frequency_dict) else 0 for cap_number in cap_numbers}
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if len(cap_numbers) > group_cap_counts[group]:
          sorted_frequency_dict = {cap_number: sorted_frequency_dict[cap_number] if
                                   cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                                   cap_numbers}
      • Refactored Code:
      # Get frequency;
      frequency_dict = {key: np.where(predicted_subject_timeseries[subj_id][curr_run] == key,1,0).sum()
                        for key in range(1, group_cap_counts[group] + 1)}
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if max(cap_numbers) > group_cap_counts[group]:
          for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): frequency_dict.update({i: float("nan")})
    • "temporal_fraction"

      • Previous Code:
      proportion_dict = {key: item/(len(predicted_subject_timeseries[subj_id][curr_run]))
                                     for key, item in sorted_frequency_dict.items()}
      • "Refactored Code": Nothing other than some parameter names have changed.
      proportion_dict = {key: value/(len(predicted_subject_timeseries[subj_id][curr_run]))
                         for key, value in frequency_dict.items()}
    • "persistence"

      • Previous Code:
      # Initialize variable
      persistence_dict = {}
      uninterrupted_volumes = []
      count = 0
      # Iterate through caps
      for target in cap_numbers:
          # Iterate through each element and count uninterrupted volumes that equal target
          for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
              if predicted_subject_timeseries[subj_id][curr_run][index] == target:
                  count +=1
              # Store count in list if interrupted and not zero
              else:
                  if count != 0:
                      uninterrupted_volumes.append(count)
                  # Reset counter
                  count = 0
          # In the event, a participant only occupies one CAP and to ensure final counts are added
          if count > 0:
              uninterrupted_volumes.append(count)
          # If uninterrupted_volumes not zero, multiply elements in the list by repetition time, sum and divide
          if len(uninterrupted_volumes) > 0:
              persistence_value = np.array(uninterrupted_volumes).sum()/len(uninterrupted_volumes)
              if tr:
                  persistence_dict.update({target: persistence_value*tr})
              else:
                  persistence_dict.update({target: persistence_value})
          else:
              # Zero indicates that a participant has zero instances of the CAP
              persistence_dict.update({target: 0})
          # Reset variables
          count = 0
          uninterrupted_volumes = []
      
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if len(cap_numbers) > group_cap_counts[group]:
          persistence_dict = {cap_number: persistence_dict[cap_number] if
                              cap_number <= group_cap_counts[group] else float("nan") for cap_number in
                              cap_numbers}
      • Refactored Code:
      # Initialize variable
      persistence_dict = {}
      # Iterate through caps
      for target in cap_numbers:
          # Binary representation of array - if [1,2,1,1,1,3] and target is 1, then it is [1,0,1,1,1,0]
          binary_arr = np.where(predicted_subject_timeseries[subj_id][curr_run] == target,1,0)
          # Get indices of values that equal 1; [0,2,3,4]
          target_indices = np.where(binary_arr == 1)[0]
          # Count the transitions, indices where diff > 1 is a transition; diff of indices = [2,1,1];
          # binary for diff > 1 = [1,0,0]; thus, segments = transitions + first_sequence(1) = 2
          segments = np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() + 1
          # Sum of ones in the binary array divided by segments, then multiplied by 1 or the tr; segment is
          # always 1 at minimum due to + 1; np.where(np.diff(target_indices, n=1) > 1, 1,0).sum() is 0 when empty or the condition isn't met
          persistence_dict.update({target: (binary_arr.sum()/segments) * (tr if tr else 1)})
      
      # Replace zeros with nan for groups with less caps than the group with the max caps
      if max(cap_numbers) > group_cap_counts[group]:
          for i in range(group_cap_counts[group] + 1, max(cap_numbers) + 1): persistence_dict.update({i: float("nan")})
    • "transition_frequency"

      • Previous Code:
      count = 0
      # Iterate through predicted values
      for index in range(0,len(predicted_subject_timeseries[subj_id][curr_run])):
          if index != 0:
              # If the subsequent element does not equal the previous element, this is considered a transition
              if predicted_subject_timeseries[subj_id][curr_run][index-1] != predicted_subject_timeseries[subj_id][curr_run][index]:
                  count +=1
      # Populate DataFrame
      new_row = [subj_id, group_name, curr_run, count]
      df_dict["transition_frequency"].loc[len(df_dict["transition_frequency"])] = new_row
      • Refactored Code:
      # Sum the differences that are not zero - [1,2,1,1,1,3] becomes [1,-1,0,0,2], binary representation
      # for values not zero is [1,1,0,0,1] = 3 transitions
      transition_frequency = np.where(np.diff(predicted_subject_timeseries[subj_id][curr_run]) != 0,1,0).sum()

🐛 Fixes

  • When a pickle file was used as input in standardize or change_dtype an error was produced, this has been fixed
    and these functions accept a list of dictionaries or a list of pickle files now.

💻 Metadata

  • In the documentation for CAP.caps2corr it is now explicitly stated that the type of correlation being used is
    Pearson correlation.

0.14.7

17 Jul 18:27
Compare
Choose a tag to compare

[0.14.7] - 2024-07-17

♻ Changed

  • Improved Warning Messages and Print Statements:
    • In TimeseriesExtractor.get_bold, the subject-specific information output has been reformatted for better readability:

      • Previous Format:
      Subject: 1; run:1 - Message
      
      • New Format:
      [SUBJECT: 1 | SESSION: 1 | TASK: rest | RUN: 1]
      -----------------------------------------------
      Message
      
    • In CAP class numerous warnings and statements have been changed to improve clarity:

      • Previous Format:
      Optimal cluster size using silhouette method for A is 2.
      
      • New Format:
      [GROUP: A | METHOD: silhouette] - Optimal cluster size is 2.
      
    • These changes should improve clarity when viewing in a terminal or when redirected to an output file by SLURM.

    • Language in many statements and warnings have also been improved.

[0.14.6] - 2024-07-16

🐛 Fixes

  • For CAP.get_caps, when cluster_selection_method was used to find the optimal cluster size, the model would be
    re-estimated and stored in the self.kmeans property for later use. Previously, the internal function that generated the
    model using scikit's KMeans only returned the performance metrics. These metrics for each cluster size were assessed,
    and the best cluster size was used to generate the optimal KMeans model with the same parameters. This is fine when
    setting random_seed with the same k since the model would produce the same initial cluster centroids and produces similar
    clustering solution regardless of the number of times the model is re-generated. However, if a random seed was not used,
    the newly re-generated optimal model would technically differ despite having the same k, due to the random nature of KMeans
    when initializing the cluster centroids. Now, the internal function returns both the performance metrics and the models,
    ensuring the exact same model that was assessed is stored in the self.kmeans. Shouldn't be an incredibly major issue
    if your models are generally stable and produce similar cluster solutions. Though when not using a random seed, even
    minor differences in the kmeans model even when using the same k can produce some statistical differences. Ultimately,
    it is always best to ensure that the same model that the same model used for assessment and for later analyses are the
    same to ensure robust results.

0.14.6

16 Jul 19:00
Compare
Choose a tag to compare

🐛 Fixes

  • For CAP.get_caps, when cluster_selection_method was used to find the optimal cluster size, the model would be
    re-estimated and stored in the self.kmeans property for later use. Previously, the internal function that generated the
    model using scikit's KMeans only returned the performance metrics. These metrics for each cluster size were assessed,
    and the best cluster size was used to generate the optimal KMeans model with the same parameters. This is fine when
    setting random_seed with the same k since the model would produce the same initial cluster centroids and produces similar
    clustering solution regardless of the number of times the model is re-generated. However, if a random seed was not used,
    the newly re-generated optimal model would technically differ despite having the same k, due to the random nature of KMeans
    when initializing the cluster centroids. Now, the internal function returns both the performance metrics and the models,
    ensuring the exact same model that was assessed is stored in the self.kmeans. Shouldn't be an incredibly major issue
    if your models are generally stable and produce similar cluster solutions. Though when not using a random seed, even
    minor differences in the kmeans model even when using the same k can produce some statistical differences. Ultimately,
    it is always best to ensure that the same model that the same model used for assessment and for later analyses are the
    same to ensure robust results.

0.14.5

16 Jul 04:23
Compare
Choose a tag to compare

♻ Changed

  • In TimeseriesExtractor, dummy_scans can now be a dictionary that uses the "auto" sub-key If "auto" is set to
    True, the number of dummy scans removed depend on the number of "non_steady_state_outlier_XX" columns in the
    participants fMRIPrep confounds tsv file. For instance, if there are two "non_steady_state_outlier_XX" columns
    detected, then dummy_scans is set to two since there is one "non_steady_state_outlier_XX" per outlier volume for
    fMRIPrep. This is assessed for each run of all participants so dummy_scans depends on the number number of
    "non_steady_state_outlier_XX" in the confound file associated with the specific participant, task, and run number.

🐛 Fixes

  • For defensive programming purposes, instead of assuming the timing information in the event file perfectly
    coincides with the timeseries. When a condition is specified and onset and duration must be used to extract the
    indices corresponding to the condition of interest, the max scan index is checked to see if it exceeds the length of
    the timeseries. If this condition is met, a warning is issued in the event of timing misalignment (i.e errors in event
    file, incorrect repetition time, etc) and invalid indices are ignored to only extract the valid indices from the timeseries.
    This is done in the event this was that are greater than the timeseries shape are ignored.

0.14.4

15 Jul 04:49
Compare
Choose a tag to compare

[0.14.4] - 2024-07-15

♻ Changed

  • Minor update that prints the optimal cluster size for each group when using cluster_selection_method in
    CAP.get_caps(). Just for information purposes.
  • Previously version 0.14.3.post1

[0.14.3.post1] - YANKED

♻ Changed

  • Minor update that prints the optimal cluster size for each group when using cluster_selection_method in
    CAP.get_caps(). Just for information purposes.

  • Yanked due to not being a metadata update, this should be a patch update to denote a behavioral change,
    this is now version 0.14.4 to adhere a bit better to versioning practices.

[0.14.3] - 2024-07-14

  • Thought of some minor changes.

♻ Changed

  • Added new warning if fd_threshold is specified but use_confounds is False since fd_threshold needs the confound
    file from fMRIPrep. In previous version, censoring just didn't occur and never issued a warning.
  • Changed the error exception types for cosine similarity in CAP.caps2radar from ValueError to ZeroDivisionError
  • Added ValueError in TimeseriesExtractor.visualize_bold if both region and roi_indx is None.
  • In TimeseriesExtractor.visualize_bold if roi_indx is a string, int, or list with a single element, a title is
    added to the plot.

[0.14.2.post2] - 2024-07-14

💻 Metadata

  • Simply wanted the latest metadata update to be on Zenodo and to have the same DOI as I forgot to upload
    version 0.14.2.post1 there.

[0.14.2.post1] - 2024-07-14

💻 Metadata

  • Updated a warning during timeseries extraction that only included a partial reason for why the indices for condition
    have been filtered out. Added information about fd_threshold being the reason why.

[0.14.2] - 2024-07-14

♻ Changed

  • Implemented a minor code refactoring that allows runs flagged due to "outlier_percentage", runs were all volumes will
    be scrubbed due to all volumes exceeding the threshold for framewise displacement, and runs were the specified condition
    returns zero indices will not undergo timeseries extraction.
  • Also clarified the language in a warning that occurs when all NifTI files have been excluded or missing for a subject.

🐛 Fixes

  • If a condition does not exist in the event file, a warning will be issued if this occurs. This should prevent empty
    timeseries or errors. In the warning the condition will be named in the event of a spelling error.
  • Added specific error type to except blocks for the cosine similarities that cause a division by zero error.