Novel architecture for gated recurrent unit autoencoder trained on time series from electronic health records enables detection of ICU patient subgroups

Read our paper at https://www.nature.com/articles/s41598-023-30986-1

Installation

Clone this repository:

git clone [email protected]:JRC-COMBINE/ehr-time-series-gru-autoencoder.git

Enter the repository's root directory.
Using the package manager pip, install the requirements:

pip install -r requirements.txt

(this has been tested with Python 3.7.16)

Get access to MIMIC-III.
Set the environment variable MIMIC_URL to the URL of a MIMIC-III database (local or remote).
(Optional) If you want to use the HCUP Chronic Condition Indicator, download the CSV file into the correct directory:

cd info
wget https://www.hcup-us.ahrq.gov/toolssoftware/chronic/cci2015.csv
cd ..

and change use_cci = False to use_cci = True in info/IcdInfo.py.

Done. Try training your model!

Usage

All commands should be run in the project's root directory unless otherwise specified.

Display the command line help using:

python full_pipeline_mimic.py --help

Demo on Generic Files

A full example of using the software for feature extraction on generic CSV files with time series is available in generic_dataframe_demo.py.

Full Pipeline (Training, Clustering, Evaluation) on MIMIC

Using default settings:

python full_pipeline_mimic.py

The script allows modifying the settings using command line arguments, e.g. for running with 20000 admissions and training for at most 25 epochs:

python full_pipeline_mimic.py --admissions 20000 --max_epochs 25

Preprocessing, training or model hyperparameters can be set directly when executing on the command line using the prep, training, and model prefixes followed by the name of the hyperparameter, e.g.:

python full_pipeline_mimic.py --prep_scaling_mode standard --training_batch_size 12 --model_rnn_size 150

Using the script run_full_published.sh, you can run the pipeline with the same settings as used in the publication.

Random Architecture Search

Search can be run either locally or on a SLURM-based compute cluster. The search script needs to be run from its own directory:

cd search

Using arguments, one can define a common prefix for search runs, the number of runs, and the number of admissions to train and evaluate on:

python cluster_search.py --bout_id my_random_search --admissions 10000 --runs 50

For submitting the resulting SLURM job under a different account name, set the environment variable SLURM_ACCOUNT_DL to the desired account name.

The search can also be run sequentially on a single computer by setting the --interactive switch.

Citation

You can access the paper for this repository at https://www.nature.com/articles/s41598-023-30986-1.

Merkelbach, K., Schaper, S., Diedrich, C. et al. Novel architecture for gated recurrent unit autoencoder trained on time series from electronic health records enables detection of ICU patient subgroups. Sci Rep 13, 4053 (2023). https://doi.org/10.1038/s41598-023-30986-1

DOI https://doi.org/10.1038/s41598-023-30986-1

Credit

Paper: Kilian Merkelbach, Steffen Schaper, Christian Diedrich, Sebastian Johannes Fritsch, Andreas Schuppert
Training, Model, Clustering, Evaluation, Search: Kilian Merkelbach
Data Extraction Pipeline: Richard Polzin, Konstantin Sharafutdinov, Jayesh Bhat, Kilian Merkelbach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Novel architecture for gated recurrent unit autoencoder trained on time series from electronic health records enables detection of ICU patient subgroups

Installation

Usage

Demo on Generic Files

Full Pipeline (Training, Clustering, Evaluation) on MIMIC

Random Architecture Search

Citation

Credit

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ai		ai
common		common
data		data
evaluation		evaluation
info		info
search		search
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
full_pipeline_mimic.py		full_pipeline_mimic.py
generic_dataframe_demo.py		generic_dataframe_demo.py
requirements.txt		requirements.txt
run_full_published.sh		run_full_published.sh

License

JRC-COMBINE/ehr-time-series-gru-autoencoder

Folders and files

Latest commit

History

Repository files navigation

Novel architecture for gated recurrent unit autoencoder trained on time series from electronic health records enables detection of ICU patient subgroups

Installation

Usage

Demo on Generic Files

Full Pipeline (Training, Clustering, Evaluation) on MIMIC

Random Architecture Search

Citation

Credit

About

Resources

License

Stars

Watchers

Forks

Languages