Comparing-Pretrained-Language-Models-for-Molecular-Activity-Prediction

I predicted the pCHEMBL values, AlogP values, Molecular Weight and number of Lipinski's Rule of 5 Violations of a biomolecule by end-to-end training of multiple pre-trained Language models on Dopamine D2 active compounds sourced from the CHeMBL database.

• pCHEMBL represents the negative logarithm (base 10) of the standard values, pro- viding a more balanced and standardized representation of potency across various values. It is a standardized version of the Standard Value, measuring the molecule's bioactivity.
• AlogP measures a molecule’s lipophilicity or affinity to lipids/fats versus water. This property is crucial as it significantly influences a drug’s pharmacokinetics, impacting its absorption, distribution, metabolism, and excretion within the body. Compounds with balanced AlogP values are more likely to be absorbed efficiently and exhibit favourable pharmacological characteristics.
• Molecular Weight is a crucial factor in drug discovery and biopharma. It is also a factor considered in Lipinski’s Rule of Five.
• RO5 Violations The number of Lipinski’s Rule of Five violations. Lipinski’s rule of five is a widely used rule of thumb in medicinal chemistry to evaluate drug likeness or oral drugs.

The implemented models are:
• RoBERTa randomly initialized, 125 million parameters
• RoBERTa pre-trained, 125 million parameters
• ChemBERTa pre-trained on PubChem 1M, 85 million parameters
• ChemBERTa pre-trained on 10M ZINC database, 3.5 million parameters
• ChemGPT pre-trained on PubChem10M Smile strings, 1.2 billion parameters

Use the main.ipynb file for end-to-end training and use_pretrained.ipynb for freezing the pre-trained language model part and only training the final linear layers for regression. The chosen models can be changed in the second cell.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Results		Results
EDA.ipynb		EDA.ipynb
README.md		README.md
main.ipynb		main.ipynb
parkinsons.csv		parkinsons.csv
use_pretrained.ipynb		use_pretrained.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing-Pretrained-Language-Models-for-Molecular-Activity-Prediction

About

Releases

Packages

Languages

rishi2002/Comparing-Pretrained-Language-Models-for-Molecular-Activity-Prediction

Folders and files

Latest commit

History

Repository files navigation

Comparing-Pretrained-Language-Models-for-Molecular-Activity-Prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages