Fetal Heart Rate Risk Classification

Predicting Fetal Risk Using Cardiotocographic Data

Authors: Jacob Heyman, Ismael Araujo

Overview

As medical technology advances, the rate of preventable child mortality decreases. Lowering child mortality rates has become a key goal in advancing any society and a key in human progress as a whole. While there are many advancements that have improved the mortality rate, not all of these practices are globally available. In order to improve mortality rates in cost effective and readily available solutions need to be applied and perfected.

One such cost effective and relatively simple method would be the use of Cardiotocograms (CTG). The CTG is a non-invasive fetal monitor which is used to assess fetal health. The CTG is used to detect fetal heart rate (FHR), uterine contractions, fetal movement, and sudden changes in heart rate. Extreme changes to the FHR results in surgical intervention through the use of cesarian delivery(C-section). Currently, doctors rely on a visual analysis of the CTG leading to erroneous interpretations of the exam. Minute changes which may be extremely detrimental to fetal health may also not be visible to the naked eye. Conversely some observable changes in fetal heart rate may appear to be fetal distress but are just a response to other factors like uterine contractions. Opting to perform a C-section increases the chances of mortality not only for the child but also the mother. For these reasons’ visual analysis of a CTG exam is limited by human error and cannot definitively detect fetal distress.

In order to improve diagnosis of fetal distress with a CTG and improve the mortality rate, the use of machine learning algorithms becomes a viable option. A well-trained model will be able to identify which variable changes to the FHR have the greatest effect on fetal health. To lower the overall risk of child mortality a ML model must not only be able to accurately classify between normal and distressed fetal health but be precise enough in its predictions to prevent any unnecessary interventive surgical procedures.

For our analysis of the we used a CTG exam dataset found on Kaggle from The Journal of Maternal-Fetal Medicine. The dataset consisted of 2126 CTG exam records which were classified into three classes (normal, suspect, pathological). In this initial analysis we decided to create a binary class of normal and distressed, combining the suspect and pathological classes into one class.

To address the issue of eliminating human error and erroneous surgical intervention, our goal of this analysis was to identify which key features played a role in determining the healthy or distressed classes, and to produce a highly precise model to eliminate unnecessary C-sections. To achieve these goals, we performed an extensive exploratory data analysis, created baseline models and preformed hyperparameter tuning of models for both an experimental feature and unaltered CTG data set. Our final model had clear feature importance and the best evaluation metrics.

Business Problem

Preventing child mortality is a key indicator of medical progress. A large percentage of deaths are the result of low access to resources that monitor fetal health. One cost effective solution to monitor fetal health is the use of Cardiotocograms(CTGs). CTGs monitor fetal heart rate, fetal movement, uterine contractions, sudden heart rate changes and many other health metrics. CTG scans are currently interpreted via visual analysis by the physician, and erroneous errors may result in increased fetal health risk. Using a Kaggle database of collected CTG exams, we aim to identify which features have the most impact on fetal health being either normal or distressed. Using these features, we also aim to tune a model that provides the best predictions of fetal health class. To identify the key features and create a precise model we considered the following research questions:

What feature have the greatest influence on the model
What, if any, new features affect the model
Which model makes the best predictions of fetal health class, and has the most precise predictions.

Data

Our data came from Kaggle.com (https://www.kaggle.com/andrewmvd/fetal-health-classification). The data comes from 2126 cardiotocogram measurements described in Ayres de Campos et al. (2000) SisPorto 2.0 A Program for Automated Analysis of Cardiotocograms. J Matern Fetal Med 5:311-318. The CTG measurements where used to create a classification model to classify fetal health being either normal or distressed. The DataFrame included the following columns:

baseline_value - Baseline Fetal Heart Rate
accelerations - Number of accelerations per second
fetal_movement - Number of fetal movements per second
uterine_contractions - Number of uterine contractions per second
light_decelerations - Number of LDs per second
severe_deceleration - Number of Sds per second
prolongued_deceleration - Number of PDs per second
abnormal_short_term_variatability - Percentage of time with abnormal short tearm variatability
mean_value_of_short_term_variability - Mean value of short term variability
percentage_of_time_with_abnormal_long_term_variability - Percentage of time with abnormal long term variability
mean_value_of_long_term_variability - average value of long tearm variability
histogram_width - Width of histogram using all the values from the record
histogram_min - minimum value of histogram
histogram_max - maximum value of hitogram
histogram_number_of_peaks - number of peaks in the exam histogram
histogram_number_of_zeroes - number of zeros in the exam histogram
histogram_mode - histogram mode
histogram_mean - histogram mean
histogram_median - histogram median
histogram_variance - histogram variance
histogram_tendency - histogram tendancy
fetal_health - 0: normal 1:distressed

Methods

To determine the key features in fetal health classification and produce the best possible model, we implemented the following methods in our analysis

Initial Data Analysis and Data cleaning
- changed target class label into binary fetal health class (normal: 1, distressed: 2)
EDA
- In depth analysis of features and relations with other features and the target class
Baseline models
- default models(KNN, Logistic Regression, Decision Tree, and Random Forest)
Feature Engineering
Created two modeling process notebooks Vanilla and Experimental
- to compare unaltered dataset model tuning to experimental feature dataset
modeling and hyperparameter tuning for both databases
- KNN
- Logistic Regression
  - Bagging classifer
- Decision Tree
- Random Forest
  - GridSearch
  - Model with best parameteres
- XGBoost
  - Gridsearch
  - Model with best parameters
checked feature importance of best models

Results

In our exploritory analysis we preformed several baseline models to see how the dataset classifed fetal health. Each baseline model was surprisingly accurate at predicting the classes. We decided to preform two seperate modeling processes focusing on tuning the hyperparameters on the untouched dataset(vanilla) and an experimental dataset with new features. For the experimental dataset we created several new features, using binning, dummy variables and a combination of several existing features.

The best model overall for vanilla was a grid search best features random forest with the following hyperparameters and evaluation metrics:

Best Parameters: {'criterion': 'entropy', 'max_depth': 10, 'max_features': 10, 'n_estimators': 50, 'oob_score': False}
Evaluation Metrics
- Accuracy: 0.9605263157894737
- Recall: 0.9755501222493888
- F1 Score: 0.9743589743589743
- Precision: 0.973170731707317
Top 10 Features

The best model overall for Experimental was a grid search best features XGBboost with the following hyperparemeters and evaluation metrics:

Best Parameters: {'colsample_bytree': 0.45,'learning_rate': 0.07,'max_depth': 12,'min_child_weight': 1, 'n_estimators': 500}
Evaluation Metrics
- Accuracy: 0.9661654135338346
- Recall: 0.9902200488997555
- F1 Score: 0.9782608695652174
- Precision: 0.9665871121718377
Top 10 Features:

Conclusions

Cardiotocographs are a cost effective method to moniter fetal health, and use are used as a tool to lower the child mortality rate. One of the major issues with CTG monitereing was errors made through visual analysis. Any invterventive surgery weather necessary or not increase risk. In this study, we utalized multiple modeling processes to predict fetal health class, and attempted to create the highest precision in our prediction to reduce false positive classification instances. With the CTG exam dataset from kaggle, we preformed two simultaneous medeling processes with a vanilla and experimental data set. While the random forest model of our vanilla set had the highest precision, which was our target metric, the xgboost model of the experimental had the best overall metrics with a very high precision. The key features in our best model where abnormal short term varitability, histogram mode, histogram min, histogram mean and histogram width. All of these features are measurements of the fetal heart rates change over time. This lines up with what is visually obsereved for fetal distress in the O.R. Our model has a high precision and other metrics and can predict on these features to a greater effect than the standared visual analysis. We believe that our model will help in the precise classification of fetal distress and improve the overall child mortality rate by reducing risk through human error.

Next Steps

make a multiclass for fetal health, multiple classes of fetal health to better classify the health state
find more CTG exam data to better train model. This data set did not have a lot of instances for poor fetal health
consider maternal health and other diagnostic metrics into the model(heart rate, oxygen level, what anestetics are used)

For More Information

Repository Structure

├── README.md                           <- Overview of the project
├── fetal_heatlh_data_analysis          <- Analysis and EDA of Fetal health dataset
├── Vanilla_model_cleaned               <- Modeling process with vanilla dataset
├── experimental_feature_Modeling       <- Modeling process with experimental dataset
├── experiment_features.csv             <- experimental dataset       
├─  data.csv                            <- folder with fetal health dataset
└── Phase_3_Project_Pres                <- Presentation slidedeck

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
data.csv		data.csv
images		images
.gitignore		.gitignore
Project_Presentation_Jacob_Ismael.pdf		Project_Presentation_Jacob_Ismael.pdf
README.md		README.md
experiment_features.csv		experiment_features.csv
experimental_feature_Modeling.ipynb		experimental_feature_Modeling.ipynb
fetal_heatlh_data_analysis.ipynb		fetal_heatlh_data_analysis.ipynb
vanilla_model_cleaned.ipynb		vanilla_model_cleaned.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fetal Heart Rate Risk Classification

Predicting Fetal Risk Using Cardiotocographic Data

Overview

Business Problem

Data

Methods

Results

Conclusions

Next Steps

For More Information

Repository Structure

About

Releases

Packages

Contributors 2

Languages

Jacobheyman702/Predicting-Fetal-Risk-Using-Cardiotocographic-Data

Folders and files

Latest commit

History

Repository files navigation

Fetal Heart Rate Risk Classification

Predicting Fetal Risk Using Cardiotocographic Data

Overview

Business Problem

Data

Methods

Results

Conclusions

Next Steps

For More Information

Repository Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages