GitHub - Bhattacharya-Lab/lociPARSE: locality-aware invariant Point Attention-based RNA ScorEr

lociPARSE: a locality-aware invariant point attention model for scoring RNA 3D structures

by Sumit Tarafder and Debswapna Bhattacharya

Codebase for our locality-aware invariant Point Attention-based RNA ScorEr (lociPARSE).

Installation

Use conda virtual environment to install dependencies for lociPARSE. The following command will create a virtual environment named 'lociPARSE'.

conda env create -f lociPARSE_environment.yml

Activate the virtual environment

conda activate lociPARSE

Typical installation time on a "normal" desktop computer should take a few minutes in a 64-bit Linux system.

Usage

Instructions for running lociPARSE:

Put the desired pdb(s) inside the 'Input/RNA_pdbs' folder.
Put the list of PDB IDs in 'input.txt' inside 'Input' folder. See the example in the 'Input' folder.

Run

chmod a+x lociPARSE_predict.sh && ./lociPARSE_predict.sh Model/QAmodel_lociPARSE.pt

The script takes the model path as an argument. It will generate features for every ID listed in Input/input.txt and store in individual folder inside 'Feature' folder. Then it will run inference and store predicted molecular-level lDDT (pMoL) and predicted nucleotide-wise lDDT (pNuL) in "score.txt" in individual folder inside 'Prediction' folder.
First line in the output "score.txt" shows pMoL score. Each of the subsequent lines sepcify 2 columns: column-1: nucleotide index in PDB and column-2: pNuL score.

Inference time for a typical RNA structure (~70 nucleotides) should take a few seconds.

Datasets

The lists of IDs used in our training set, test sets and validation set used in ablation study are available here.
Training set and test set of 30 independent RNAs were taken from trRosettaRNA.
CASP15 experimental strctures and all submiited predictions were downloaded from CASP15.
The set of 60 non-redundant RNA targets TS60 for hyperparameter optimization was in-house curated. See (https://doi.org/10.1093/biomethods/bpae047) for more details.

Training lociPARSE

If you wish to train lociPARSE from scratch on our training set, please follow these steps:

Download our training dataset Train.tar.gz from here and place it inside Input/Dataset folder.
Extract the training dataset
```
tar -xzvf Train.tar.gz
```
Run the following command to train our architecture
```
chmod a+x lociPARSE_train.sh && ./lociPARSE_train.sh > log.txt
```
It will take approximately 16 hours to finish feature generation and 50 epochs of training on a single A100 gpu.
The best model saved on validation loss will be placed inside the Model folder as "QAmodel_retrained.pt". You can use this model to predict as instructed in Usage section.

Evaluation of lociPARSE

If you want to generate our reported results in the paper from the provided predictions, follow these steps:

Extract the provided Evaluation folder which contains all the predictions and ground truths.
```
tar -xzvf Evaluation.tar.gz
```

To generate Tables 1-6, please run the following commands one by one.

cd Evaluate
python3 QA_eval.py Test30_CASP15 0
python3 QA_eval.py ARES_benchmark2 0

You will find the corresponding results inside Evaluation/Results folder.
To generate Supplementary Figures S1-S2, please run the following commands.
```
 cd Evaluate
 python3 draw.py
```
Generated figures will be inside Evaluation/Figures folder.

If you want to predict the scores by lociPARSE from scratch and re-evaluate, follow these steps:

Download our test datasets Test.tar.gz and Ares_set.tar.gz from here and place it inside Input/Dataset folder.

Extract the folders

tar -xzvf Test.tar.gz

tar -xzvf Ares_set.tar.gz

To predict and evaluate results on our two test sets Test30 and CASP15 (Tables 1-5), please run the following command.
```
chmod a+x evaluate.sh && ./evaluate.sh Test30_CASP15 Model/QAmodel_lociPARSE.pt
```
To predict and evaluate results on ARES benchmark set-2 (Table 6), please run the following command. [This will be slow due to ~76k models in this test set]
```
chmod a+x evaluate.sh && ./evaluate.sh ARES_benchmark2 Model/QAmodel_Ares_set.pt
```
You will find the corresponding results inside Evaluation/Results folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lociPARSE: a locality-aware invariant point attention model for scoring RNA 3D structures

Installation

Usage

Datasets

Training lociPARSE

Evaluation of lociPARSE

About

Releases 1

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Feature		Feature
Input		Input
Model		Model
Prediction		Prediction
Scripts		Scripts
.gitignore		.gitignore
Evaluation.tar.gz		Evaluation.tar.gz
LICENSE		LICENSE
README.md		README.md
collect.py		collect.py
evaluate.sh		evaluate.sh
lociPARSE.png		lociPARSE.png
lociPARSE_environment.yml		lociPARSE_environment.yml
lociPARSE_predict.sh		lociPARSE_predict.sh
lociPARSE_train.sh		lociPARSE_train.sh

License

Bhattacharya-Lab/lociPARSE

Folders and files

Latest commit

History

Repository files navigation

lociPARSE: a locality-aware invariant point attention model for scoring RNA 3D structures

Installation

Usage

Datasets

Training lociPARSE

Evaluation of lociPARSE

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages