PASTEC-singularity

Description

Singularity container for the transposable elements classification tool PASTEC(from the package REPET).

Dependancies

singularity (tested with v.3.9.5)

Installation

Development installation

Clone this repo:

git clone https://github.com/TommasoBarberis/PASTEC-singularity.git

Move to the PASTEC directory:

cd PASTEC-singularity

To build the .sif image from the definition file:

sudo singularity build pastec_latest.sif PASTEC.def

Only user installation

Pull directly the .sif image from the sylabs.io registry:

singularity pull --arch amd64 library://tommasobarberis98/pastec/pastec:latest

Utilisation

Store your project directory path in a variable:

PROJECT_DIR=path/to/project/dir

This directory has to contain:

the fasta file with consensi sequences:

From the PASTEC documentation, each sequence must have 60 bp (or less) by line, so if you have a "one-line" FASTA file, you can transform it with the following command:
```
sed '/^>/!s/.\{60\}/&\n/g' oneline.fasta > multiline.fasta
```
About the sequence headers, it is highly advised to write them like this : ">XX_i" with XX standing for letters and i standing for numbers.
So, if you another type of header, you can try:
```
sed 's/>.\+$/>TE/g' old_file.fa | awk '/^>/{$0=$0"_"(++i)}1' > new_file.fa
```
the PASTEClassifier.cfg file (a template is provided with the github repo at the root folder) to update some parameters such as the path to a known database of transposable elements (see more here: PASTEClassfier-tuto). Options that you are suggested to update:
- project_name: whatever you want.
- TE_nucl_bank: /mnt/nucl_bank.fa such as repbase20.05_ntSeq_cleaned_TE.fa (from RepBase20.05_REPET.embl require subscription to girinst), but you are free to use any other database. Then you can set:
  - TE_BLRtx: yes ➡️ homology with known TEs using tblastx.
  - TE_BLRn: yes ➡️ detection of helitron extremities.
- TE_prot_bank: /mnt/prot_bank.fa such as repbase20.05_aaSeq_cleaned_TE.fa (from RepBase20.05_REPET.embl require subscription to girinst), but you are free to use any other database. Then you can set:
  - TE_BLRx: yes ➡️ homology with known TEs using blastx.
- HG_nucl_bank: /mnt/host_genes.fa for the cDNA database of the host genes. Then you can set:
  - HG_BLRn: yes ➡️ homology with host genes.
- rDNA_bank: /mnt/rdna.fa for the rDNA database. Then you can set:
  - rDNA_BLRn: yes ➡️ homology with rDNA.
- TE_HMM_profiles: is already set to the ProfilesBankForREPET_Pfam32.0.hmm database, but your are free to use an another database.
- Adjustable parameters in [classif_consensus] section (see PASTEClassfier-tuto).
the various databases that you can define in the PASTEClassifier.cfg file.

Container initialization

The PASTEC program use the MySQL database that need to be run as a server, for that you need to initialize the container with this service by creating a singularity instance:

bash path/to/init.sh -s pastec_latest.sif -d $PROJECT_DIR

Interactive mode

Start a shell through the container:

cd $PROJECT_DIR
singularity shell instance://pastec

Once in the container, run:

python2.7 /opt/PASTEC_linux-x64-2.0/bin/PASTEClassifier.py -i /mnt/consensi.fasta -C /mnt/PASTEClassifier.cfg

NOTE: the /mnt refer to your PROJECT_DIR, so you have to conserve it and you have only to append the file names.

Batch mode

Create a file pastec_cmd.sh (a template is provided in the test folder) that will contain the PASTEC command in your project directory ($PROJECT_DIR):

#! /bin/bash 

cd /mnt
python2.7 /opt/PASTEC_linux-x64-2.0/bin/PASTEClassifier.py -i /mnt/consensi.fasta -C /mnt/PASTEClassifier.cfg

Then you can run PASTEC using the container instance as follow:

cd $PROJECT_DIR
singularity exec instance://pastec /mnt/pastec_cmd.sh

NOTE: the /mnt refer to your PROJECT_DIR, so you have to conserve it and you have only to append the file names.

Other `PASTEC` options

classification rules file name (e.g. PASTEClassifierRules.yaml) [optional]

-D DECISIONRULESFILENAME, --decisionRules=DECISIONRULESFILENAME

step (0/1/2): default: 0 for all steps, step 1 for detect features, step 2 for classification run tool in parallel

-S STEP, --step=STEP

clean temporary files [optional] [default: False]

-c, --clean

verbosity [optional] [default: 3, from 1 to 4]

-v VERBOSITY, --verbosity=VERBOSITY

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
PASTEC.def		PASTEC.def
PASTEClassifier.cfg		PASTEClassifier.cfg
README.md		README.md
init.sh		init.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PASTEC-singularity

Description

Dependancies

Installation

Development installation

Only user installation

Utilisation

Container initialization

Interactive mode

Batch mode

Other `PASTEC` options

About

Releases

Packages

Languages

License

TommasoBarberis/PASTEC-singularity

Folders and files

Latest commit

History

Repository files navigation

PASTEC-singularity

Description

Dependancies

Installation

Development installation

Only user installation

Utilisation

Container initialization

Interactive mode

Batch mode

Other PASTEC options

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Other `PASTEC` options

Packages