merged data input #1

phnghia99 · 2022-11-05T18:47:02Z

"NOVOplasty can't use merged reads" so mitoflow cannot use merged reads as input file, but NOVOplasty is update for combined reads recently. How can I overcome this issue (to input merged reads)?
I do see file "reads_merged.tsv" in example folder, if this is a solution, please inform me detail how to using it.
Another question, I also use NOVOplasty in my mitogenome assembly project, are there any difference assembly that I use mitoflow vs NOVOplasty?
Thanks :))

darcyabjones · 2022-11-06T02:49:57Z

Hi @phnghia99 ,

Unfortunately the reads_merged.tsv isn't a solution to your issue.

I wrote the pipeline to both assemble the reads and also split reads aligning to the nuclear and mito genomes.
Reads to assemble and to split/filter are specified as two separate tables (--asm_table and --filter_table). The merged reads are only going into the filter table.

It wouldn't be too hard to modify the pipeline to accept merged reads.
You'd just need to add another column to the assembly read channel here:

mitoflow/main.nf

Lines 181 to 188 in af42920

    
           if ( params.asm_table ) { 
        
               asmTable = Channel.fromPath(params.asm_table) 
        
                   .splitCsv(by: 1, sep: '\t', header: true) 
        
                   .map { [it.sample, file(it.read1_file), file(it.read2_file), it.read_length, it.insert_size] } 
        
           } else { 
        
               log.info "Hey I need some reads to assemble into a mitochondrial genome." 
        
               exit 1 
        
           }

And then update the inputs and Novoplasty config file in here:

mitoflow/main.nf

Lines 249 to 340 in af42920

    
           process assembleMito { 
        
               label "novoplasty" 
        
               label "medium_task" 
        
               publishDir "${params.outdir}/assemblies" 
        
               tag { name } 
        
               input: 
        
               file "ref.fasta" from reference 
        
               file "seed.fasta" from seed 
        
               set val(name), file("*R1.fastq.gz"), file("*R2.fastq.gz"), 
        
                   val(read_length), val(insert_size) from asmTable 
        
                       .groupTuple(by: 0) 
        
               output: 
        
               set val(name), file("${name}_mitochondrial.fasta"), 
        
                   file("${name}_log.txt") into mitoAssemblies 
        
               script: 
        
               if ( read_length.any { it != null } ) { 
        
                   read_length = (read_length - null)[0] 
        
               } else if ( params.read_length ) { 
        
                   read_length = params.read_length 
        
               } else { 
        
                   log.error "ERROR processing assembly for sample: ${name}." 
        
                   log.error "The `read_length` parameter is not set or provided in the `asm_table`." 
        
                   log.error "Please provide one of those and re-run :)." 
        
                   exit 1 
        
               } 
        
               if ( insert_size.any { it != null } ) { 
        
                   insert_size = (insert_size - null)[0] 
        
               } else if ( params.insert_size ) { 
        
                   insert_size = params.insert_size 
        
               } else { 
        
                   log.error "ERROR processing assembly for sample: ${name}." 
        
                   log.error "The `insert_size` parameter is not set or provided in the `asm_table`." 
        
                   log.error "Please provide one of those and re-run :)." 
        
                   exit 1 
        
               } 
        
               """ 
        
               cat *R1.fastq.gz > forward.fastq.gz 
        
               cat *R2.fastq.gz > reverse.fastq.gz 
        
               cat << EOF > config.txt 
        
           Project: 
        
           ----------------------- 
        
           Project name          = ${name} 
        
           Type                  = mito 
        
           Genome Range          = ${params.min_size}-${params.max_size} 
        
           K-mer                 = ${params.kmer} 
        
           Extended log          = 0 
        
           Save assembled reads  = no 
        
           Seed Input            = seed.fasta 
        
           Reference sequence    = ref.fasta 
        
           Variance detection    = no 
        
           Dataset 1: 
        
           ----------------------- 
        
           Read Length           = ${read_length} 
        
           Insert size           = ${insert_size} 
        
           Platform              = illumina 
        
           Single/Paired         = PE 
        
           Forward reads         = forward.fastq.gz 
        
           Reverse reads         = reverse.fastq.gz 
        
           Optional: 
        
           ----------------------- 
        
           Insert size auto      = yes 
        
           Insert Range          = 1.6 
        
           Insert Range strict   = 1.2 
        
           Use Quality Scores    = no 
        
           EOF 
        
               NOVOPlasty.pl -c config.txt 
        
               # Rename for better sorting and consistency. 
        
               if [[ -f  Circularized_assembly_1_${name}.fasta ]]; then 
        
                   mv Circularized_assembly_1_${name}.fasta ${name}_mitochondrial.fasta 
        
               elif [[ -f  Uncircularized_assemblies_1_${name}.fasta ]]; then 
        
                   mv Uncircularized_assemblies_1_${name}.fasta ${name}_mitochondrial.fasta 
        
               fi 
        
               mv log_${name}.txt ${name}_log.txt 
        
               rm forward.fastq.gz 
        
               rm reverse.fastq.gz 
        
               """ 
        
           }

RE "Another question, I also use NOVOplasty "
So I really only wrote this because I had a few hundred illumina genomes to process and it was just easier to write it as a nextflow pipeline.
We don't actually do a lot with mitogenomes, so mainly it was to separate the mitochondrial reads from the nuclear reads.
In the end it turned out not to matter.

If you're already using NOVOplasty, there's probably not much benefit to using mitoflow beyond the conveniences you get from nextflow job scheduling/parallelisation.

Hope that clears things up :)

Cheers,
Darcy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merged data input #1

merged data input #1

phnghia99 commented Nov 5, 2022

darcyabjones commented Nov 6, 2022

merged data input #1

merged data input #1

Comments

phnghia99 commented Nov 5, 2022

darcyabjones commented Nov 6, 2022