Skip to content

An integrated Python package for molecular descriptor generation, data processing, model training, and hyper-parameter optimization.

License

Notifications You must be signed in to change notification settings

WhitestoneYang/spoc

Repository files navigation

SPOC

This package is aimed at molecular descriptor generation, data processing, model training, and hyper-parameter optimization.

  1. Summarizes various molecular descriptor generation methods provided by different tools/packages, including RDKit, CDK, Openbabel, Pubchem, Deepchem, etc. It's easy for batch generation.
  2. Data pre-processing and splitting.
  3. Modeling training and hyperparameter optimization by leveraging Scikit-Learn, XGBoost, and LightGBM, more machine learning, and neural network methods will be included/wrapped in the future.

Dependencies

SPOC currently supports Python >= 3.6 and requires these packages on any condition.

Installation

Method 1: conda

# Clone project
git clone [email protected]:WhitestoneYang/spoc.git # or other released or tagged version.

# conda installation
bash - i conda_installation.sh

Method 2: docker

# docker build
docker build --progress=plain -t spoc .

# docker run
docker run -v $(pwd):/workspace/ --network host -it spoc

Usage

  1. Please refer the tests for descriptor generation examples, including single and multiple molecular descriptor generation examples
  2. Please refer the examples for 1) molecular descriptor generation; 2) data processing; 3) model training; 4) hyper parameter optimization workflow.

Citing SPOC

If you have used SPOC in your research, please cite our paper.

About

An integrated Python package for molecular descriptor generation, data processing, model training, and hyper-parameter optimization.

Topics

Resources

License

Stars

Watchers

Forks

Packages