Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merged data input #1

Open
phnghia99 opened this issue Nov 5, 2022 · 1 comment
Open

merged data input #1

phnghia99 opened this issue Nov 5, 2022 · 1 comment

Comments

@phnghia99
Copy link

"NOVOplasty can't use merged reads" so mitoflow cannot use merged reads as input file, but NOVOplasty is update for combined reads recently. How can I overcome this issue (to input merged reads)?
I do see file "reads_merged.tsv" in example folder, if this is a solution, please inform me detail how to using it.
Another question, I also use NOVOplasty in my mitogenome assembly project, are there any difference assembly that I use mitoflow vs NOVOplasty?
Thanks :))

@darcyabjones
Copy link
Owner

Hi @phnghia99 ,

Unfortunately the reads_merged.tsv isn't a solution to your issue.

I wrote the pipeline to both assemble the reads and also split reads aligning to the nuclear and mito genomes.
Reads to assemble and to split/filter are specified as two separate tables (--asm_table and --filter_table). The merged reads are only going into the filter table.

It wouldn't be too hard to modify the pipeline to accept merged reads.
You'd just need to add another column to the assembly read channel here:

mitoflow/main.nf

Lines 181 to 188 in af42920

if ( params.asm_table ) {
asmTable = Channel.fromPath(params.asm_table)
.splitCsv(by: 1, sep: '\t', header: true)
.map { [it.sample, file(it.read1_file), file(it.read2_file), it.read_length, it.insert_size] }
} else {
log.info "Hey I need some reads to assemble into a mitochondrial genome."
exit 1
}

And then update the inputs and Novoplasty config file in here:

mitoflow/main.nf

Lines 249 to 340 in af42920

process assembleMito {
label "novoplasty"
label "medium_task"
publishDir "${params.outdir}/assemblies"
tag { name }
input:
file "ref.fasta" from reference
file "seed.fasta" from seed
set val(name), file("*R1.fastq.gz"), file("*R2.fastq.gz"),
val(read_length), val(insert_size) from asmTable
.groupTuple(by: 0)
output:
set val(name), file("${name}_mitochondrial.fasta"),
file("${name}_log.txt") into mitoAssemblies
script:
if ( read_length.any { it != null } ) {
read_length = (read_length - null)[0]
} else if ( params.read_length ) {
read_length = params.read_length
} else {
log.error "ERROR processing assembly for sample: ${name}."
log.error "The `read_length` parameter is not set or provided in the `asm_table`."
log.error "Please provide one of those and re-run :)."
exit 1
}
if ( insert_size.any { it != null } ) {
insert_size = (insert_size - null)[0]
} else if ( params.insert_size ) {
insert_size = params.insert_size
} else {
log.error "ERROR processing assembly for sample: ${name}."
log.error "The `insert_size` parameter is not set or provided in the `asm_table`."
log.error "Please provide one of those and re-run :)."
exit 1
}
"""
cat *R1.fastq.gz > forward.fastq.gz
cat *R2.fastq.gz > reverse.fastq.gz
cat << EOF > config.txt
Project:
-----------------------
Project name = ${name}
Type = mito
Genome Range = ${params.min_size}-${params.max_size}
K-mer = ${params.kmer}
Extended log = 0
Save assembled reads = no
Seed Input = seed.fasta
Reference sequence = ref.fasta
Variance detection = no
Dataset 1:
-----------------------
Read Length = ${read_length}
Insert size = ${insert_size}
Platform = illumina
Single/Paired = PE
Forward reads = forward.fastq.gz
Reverse reads = reverse.fastq.gz
Optional:
-----------------------
Insert size auto = yes
Insert Range = 1.6
Insert Range strict = 1.2
Use Quality Scores = no
EOF
NOVOPlasty.pl -c config.txt
# Rename for better sorting and consistency.
if [[ -f Circularized_assembly_1_${name}.fasta ]]; then
mv Circularized_assembly_1_${name}.fasta ${name}_mitochondrial.fasta
elif [[ -f Uncircularized_assemblies_1_${name}.fasta ]]; then
mv Uncircularized_assemblies_1_${name}.fasta ${name}_mitochondrial.fasta
fi
mv log_${name}.txt ${name}_log.txt
rm forward.fastq.gz
rm reverse.fastq.gz
"""
}

RE "Another question, I also use NOVOplasty "
So I really only wrote this because I had a few hundred illumina genomes to process and it was just easier to write it as a nextflow pipeline.
We don't actually do a lot with mitogenomes, so mainly it was to separate the mitochondrial reads from the nuclear reads.
In the end it turned out not to matter.

If you're already using NOVOplasty, there's probably not much benefit to using mitoflow beyond the conveniences you get from nextflow job scheduling/parallelisation.

Hope that clears things up :)

Cheers,
Darcy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants