by Sumit Tarafder and Debswapna Bhattacharya
Codebase for our locality-aware invariant Point Attention-based RNA ScorEr (lociPARSE).
- Use conda virtual environment to install dependencies for lociPARSE. The following command will create a virtual environment named 'lociPARSE'.
conda env create -f lociPARSE_environment.yml
- Activate the virtual environment
conda activate lociPARSE
Typical installation time on a "normal" desktop computer should take a few minutes in a 64-bit Linux system.
Instructions for running lociPARSE:
-
Put the desired pdb(s) inside the 'Input/RNA_pdbs' folder.
-
Put the list of PDB IDs in 'input.txt' inside 'Input' folder. See the example in the 'Input' folder.
-
Run
chmod a+x lociPARSE_predict.sh && ./lociPARSE_predict.sh Model/QAmodel_lociPARSE.pt
-
The script takes the model path as an argument. It will generate features for every ID listed in Input/input.txt and store in individual folder inside 'Feature' folder. Then it will run inference and store predicted molecular-level lDDT (pMoL) and predicted nucleotide-wise lDDT (pNuL) in "score.txt" in individual folder inside 'Prediction' folder.
-
First line in the output "score.txt" shows pMoL score. Each of the subsequent lines sepcify 2 columns: column-1: nucleotide index in PDB and column-2: pNuL score.
Inference time for a typical RNA structure (~70 nucleotides) should take a few seconds.
- The lists of IDs used in our training set, test sets and validation set used in ablation study are available here.
- Training set and test set of 30 independent RNAs were taken from trRosettaRNA.
- CASP15 experimental strctures and all submiited predictions were downloaded from CASP15.
- The set of 60 non-redundant RNA targets TS60 for hyperparameter optimization was in-house curated. See (https://doi.org/10.1093/biomethods/bpae047) for more details.
If you wish to train lociPARSE from scratch on our training set, please follow these steps:
-
Download our training dataset Train.tar.gz from here and place it inside Input/Dataset folder.
-
Extract the training dataset
tar -xzvf Train.tar.gz
-
Run the following command to train our architecture
chmod a+x lociPARSE_train.sh && ./lociPARSE_train.sh > log.txt
It will take approximately 16 hours to finish feature generation and 50 epochs of training on a single A100 gpu.
-
The best model saved on validation loss will be placed inside the Model folder as "QAmodel_retrained.pt". You can use this model to predict as instructed in Usage section.
If you want to generate our reported results in the paper from the provided predictions, follow these steps:
-
Extract the provided Evaluation folder which contains all the predictions and ground truths.
tar -xzvf Evaluation.tar.gz
-
To generate Tables 1-6, please run the following commands one by one.
cd Evaluate python3 QA_eval.py Test30_CASP15 0 python3 QA_eval.py ARES_benchmark2 0
-
You will find the corresponding results inside Evaluation/Results folder.
-
To generate Supplementary Figures S1-S2, please run the following commands.
cd Evaluate python3 draw.py
-
Generated figures will be inside Evaluation/Figures folder.
If you want to predict the scores by lociPARSE from scratch and re-evaluate, follow these steps:
-
Download our test datasets Test.tar.gz and Ares_set.tar.gz from here and place it inside Input/Dataset folder.
-
Extract the folders
tar -xzvf Test.tar.gz
tar -xzvf Ares_set.tar.gz
-
To predict and evaluate results on our two test sets Test30 and CASP15 (Tables 1-5), please run the following command.
chmod a+x evaluate.sh && ./evaluate.sh Test30_CASP15 Model/QAmodel_lociPARSE.pt
-
To predict and evaluate results on ARES benchmark set-2 (Table 6), please run the following command. [This will be slow due to ~76k models in this test set]
chmod a+x evaluate.sh && ./evaluate.sh ARES_benchmark2 Model/QAmodel_Ares_set.pt
-
You will find the corresponding results inside Evaluation/Results folder.