Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

This repository hosts an open-source benchmark for Structure-based Drug Design, to facilitate the transparent and reproducible evaluation of algorithmic advances in molecular optimization. This repository supports 16 Structure-based Drug Design algorithms on 7 tasks.

Installation

There are two environments: Test Env and TDC Env. Test Env is used to run these models: 3DSBDD, Pocket2mol, PockFlow, ResGen and Autogrow4. TDC Env is used to run the rest of the models and evaluate all the models' generated molecules.

conda env create -f environment_TestEnv.yml
conda activate TestEnv2

16 Methods

Based the ML methodologies, all the methods are categorized into:

virtual screening
- screening randomly search ZINC database.
GA (genetic algorithm)
- graph_ga based on molecular graph.
- smiles_ga based on SMILES
- Autogrow4 based on SMILES
VAE (variational auto-encoder)
- smiles_vae based on SMILES
- selfies_vae based on SELFIES
RL (reinforcement learning)
- reinvent
- moldqn
HC (hill climbing)
- smiles_lstm_hc is SMILES-level HC.
- mimosa is graph-level HC
gradient (gradient ascent)
- dst is based molecular graph.
- pasithea is based on SELFIES.
Auto-regressive
- 3DSBDD
- Pocket2mol
- PocketFlow
- ResGen

time is the average rough clock time for a single run in our benchmark and do not involve the time for pretraining and data preprocess. We have processed the data, pretrained the model. Both are available in the repository.

`Model`	`Dimension`	`Generated Number`	`requires_gpu`
3DSBDD	3D	771	yes
AutoGrow4	2D	1233	yes
Pocket2mol	3D	928	yes
PocketFlow	3D	1000	yes
RenGen	3D	631	yes
DST	2D	1001	no
Graph GA	2D	643	no
MIMOSA	2D	1001	yes
MolDQN	2D	501	yes
Pasithea	1D	914	yes
REINVENT	1D	100	yes
SCREENING	-	1000	no
SELFIES-VAE-BO	1D	200	yes
SMILES-GA	1D	584	no
SMILES-LSTM-HC	1D	501	no
SMILES-VAE-BO	1D	200	yes

PDB information

All the PDB files can be downloaded from RCSB Protein Data Bank. The blinding sites are as follow:

PDB	center(x,y,z)	bounding box size
1iep	15.6138918, 53.38013513, 15.454837	15
3eml	-9.06363, -7.1446, 55.86259999	15
3ny8	2.2488, 4.68495, 51.39820000000001	15 (23 for Pocket2mol)
4rlu	-0.73599, 22.75547, -31.23689	15
4unn	5.684346153, 18.1917, -7.3715	15
5mo4	-44.901, 20.490354, 8.48335	15
7l11	-21.81481, -4.21606, -27.98378	15 (23 for Pocket2mol)

Sampling and evaluating

For 3DSBDD and Pocket2mol, we use this command to generate:

python sample_for_pdb.py --pdb_path [your pdb] --center=[centers] --bbox_size [box size] --outdir [your outdir]

Also need to change the num_samples in the sample_for_pdb.yml

For PocketFlow, we use this command to generate:

python main_generate.py -pkt [your pdb] --ckpt ckpt/ZINC-pretrained-255000.pt -n 1000 -d cuda:0 --root_path [your outdir] --name [pdb name] -at 1.0 -bt 1.0 --max_atom_num 35 -ft 0.5 -cm True --with_print True

For ResGen, we first convert our pdb file to sdf file and use this command to generate:

python gen.py --pdb_file [your pdb] --sdf_file [correspond sdf] --outdir [your outdir]

For Autogrow4, we recommend following their tutorial before running the generation command:

python RunAutogrow.py \
    --filename_of_receptor [your pdb] \
    --center_x [center x] --center_y  [center y] --center_z [center z] \
    --size_x [box size] --size_y [box size] --size_z [box size] \
    --source_compound_file /autogrow4/autogrow/source_compounds/naphthalene_smiles.smi \
    --root_output_folder /PATH_TO/output_directory/ \
    --number_of_mutants_first_generation 50 \
    --number_of_crossovers_first_generation 50 \
    --number_of_mutants 50 \
    --number_of_crossovers 50 \
    --top_mols_to_seed_next_generation 50 \
    --number_elitism_advance_from_previous_gen 50 \
    --number_elitism_advance_from_previous_gen_first_generation 10 \
    --diversity_mols_to_seed_first_generation 10 \
    --diversity_seed_depreciation_per_gen 10 \
    --num_generations 5 \
    --mgltools_directory /PATH_TO/mgltools_x86_64Linux2_1.5.6/ \
    --number_of_processors -1 \
    --scoring_choice VINA \
    --LipinskiLenientFilter \
    --start_a_new_run \
    --rxn_library ClickChem \
    --selector_choice Rank_Selector \
    --dock_choice VinaDocking \
    --max_variants_per_compound 5 \
    --redock_elite_from_previous_gen False \
    --generate_plot True \
    --reduce_files_sizes True \
    --use_docked_source_compounds True \
    >  /PATH_TO/OUTPUT/text_file.txt 2>  /PATH_TO/OUTPUT/text_errormessage_file.txt

These above models only produce molecules, to evalute these molecules with docking and heuristic oracles, using following command:

python evaluation.py --smiles_path [your path] --pdb [your pdb] --model [model name]

For the rest of models that are under PMO, we use the following command to generate, note that you should running under TDC enviornment:

oracle_array=('1iep_docking' '3eml_docking' '3ny8_docking' '4rlu_docking' '4unn_docking' '5mo4_docking' '7l11_docking')

for oralce in ${oracle_array[@]}
do
python -u run.py [model name] --task production --n_runs 1 --max_oracle_calls 1000 --oracles ${oralce}
done

After generation, you could use mol_opt_process to convert the generated yaml file to csv file and evaluate the heuristic oracles.

To know the statistics of the docking or property score, you can use following code:

python results_compare.py --eval_folder_path [your generated result] --pdb_list [your pdb list] --file_type [docking or property] --output_folder [your outdir]

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
draw.ipynb		draw.ipynb
environment_TDCEnv.yml		environment_TDCEnv.yml
environment_TestEnv.yml		environment_TestEnv.yml
evaluation.py		evaluation.py
mol_opt_process.ipynb		mol_opt_process.ipynb
results_compare.py		results_compare.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

Installation

16 Methods

PDB information

Sampling and evaluating

About

Releases

Packages

Contributors 2

Languages

License

zkysfls/2024-sbdd-benchmark

Folders and files

Latest commit

History

Repository files navigation

Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

Installation

16 Methods

PDB information

Sampling and evaluating

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages