Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update trackhubs #119

Open
AntonPetrov opened this issue Apr 22, 2022 · 0 comments
Open

Update trackhubs #119

AntonPetrov opened this issue Apr 22, 2022 · 0 comments
Assignees

Comments

@AntonPetrov
Copy link
Member

AntonPetrov commented Apr 22, 2022

The Rfam track hubs are currently updated only at major releases and the procedure is not automated.

We need to develop a NextFlow pipeline that takes GCA/GCF accessions as input and does the following:

  • download fasta files using NCBI CLI tool as discussed in Download sequences using the NCBI Datasets CLI tool #118
  • compute -Z for each genome as explained in Rfam docs
  • randomly partition Rfam.cm into sets of 100 models
    cmstat $RFAMCM | grep -v ^\# | awk '{ print $3 }' | shuf > all.list
    split -l 100 all.list cm.
    count=0
    for filename in cm.*; do cmfetch -f $RFAMCM $filename > rand.$count.cm; ((count++)); done
    rm cm.*
    
  • run cmsearch with each cm set against each fasta file
    bsub -n 8 -M 12000 "cmsearch -o <name.out> --cpu 8 -Z <genome-score> --tblout <name.tblout> --cut_ga --rfam --nohmmonly rand.1.cm chr10.fasta"
    
  • concatenate tblout files and remove overlaps using cmsearch_tblout_deoverlap
  • de-overlapped .tblout and the original cmsearch .out files can be gzipped and stored on FTP
  • de-overlapped .tblout files can be used to generate a trackhub using tblout2bigBed.pl or tblout2bigBedGenomes.pl

Bonus point: add secondary structure directly into the BED file as an additional field using the BED detail format. See RT90465 for background. Unfortunately I could not track down the code that was used for that ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants