Linking QSAR-Based Drug Target Prediction with AlphaFold

Drug-target interactions (DTIs) refer to the interactions of chemical compounds and biological targets, proteins in our case, inside the human body. They play a crucial role in drug discovery and pharmacology, however, their experimental determination is time-consuming and limited due to funding and the difficulty of purifying proteins.

Unwanted or unexpected DTIs could cause severe side effects. Therefore, the creation of in silico machine learning models with high throughput that can quickly and confidently predict whether thousands of drugs and proteins bind together and how much could be crucial for medicinal chemistry and drug development, acting as a supplement to biological experiments.

Original Aims: The project aimed to gather publicly available data on known DTIs and place them into a new curated dataset. Then, using this new dataset, train multiple machine learning models using simple QSAR descriptors derived from a drug's chemical properties and a protein's sequence and 3D structural information extracted from AlphaFold to predict whether they bind together or not.

Actual Achievements: A dataset of 163,080 DTIs was gathered using a variety of databases, libraries and biochemical APIs, subsets of which were used to train both our classification and regression models, evaluated using dummy models, holdout test sets and model interpretability tools. Classification models would try to predict whether a drug-protein pair would bind together or not and Regression models would try to predict the logKd value.

The models were then further split into "Baseline" and "Enhanced" with the former utilising just the QSAR descriptors from drug and proteins and the latter utilising the 3D structural embeddings in addition to the QSAR descriptors. This was naturally done in order to compare the effect, positive or negative, of the created structural embeddings to a baseline.

Unfortunately, our embeddings seemed to have little effect on our baseline models, which reasonably falls down to our embeddings creation process. Even though our embeddings did not have a significant impact, our high-throughput models could still be used to uncover some interesting relationships between drugs and proteins that could be later confirmed or rejected by molecular docking simulations and actual experimental trials.

Important Links

Dissertation discussing the project's life-cycle (Dissertation Link)
Google Drive holding our models and datasets (Google Drive Link)
Streamlit web application created to showcase all the different models and our work (Web App Link)

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
Diagrams		Diagrams
Dissertation		Dissertation
Interim_Report		Interim_Report
Metrics		Metrics
Molecular_Functions_Embedding_Model_&_Files		Molecular_Functions_Embedding_Model_&_Files
Presentation		Presentation
R_Scripts		R_Scripts
Streamlit_App		Streamlit_App
contactmaps		contactmaps
.gitignore		.gitignore
Classification_Baseline_Models.ipynb		Classification_Baseline_Models.ipynb
Classification_Enhanced_Models.ipynb		Classification_Enhanced_Models.ipynb
DTIs_Classification_NN.ipynb		DTIs_Classification_NN.ipynb
DTIs_Regression_NN.ipynb		DTIs_Regression_NN.ipynb
Dataset_Creation_&_Exploration.ipynb		Dataset_Creation_&_Exploration.ipynb
Manual.md		Manual.md
README.md		README.md
Regression_Baseline_Models.ipynb		Regression_Baseline_Models.ipynb
Regression_Enhanced_Models.ipynb		Regression_Enhanced_Models.ipynb
amino_acid_features.py		amino_acid_features.py
drug_features.py		drug_features.py
extract_dtis.py		extract_dtis.py
models_utils.py		models_utils.py
protein_features.py		protein_features.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linking QSAR-Based Drug Target Prediction with AlphaFold

About

Releases

Packages

Languages

GeorgeIniatis/AlphaFold_Dataset_Drug_Binding_Prediction

Folders and files

Latest commit

History

Repository files navigation

Linking QSAR-Based Drug Target Prediction with AlphaFold

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages