Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are bwa and BWA-MEME results inconsistent? #27

Closed
yukaiquan opened this issue Jan 31, 2024 · 4 comments
Closed

Why are bwa and BWA-MEME results inconsistent? #27

yukaiquan opened this issue Jan 31, 2024 · 4 comments

Comments

@yukaiquan
Copy link

Dear developer:

bwa: Version: 0.7.17-r1188
BWA-MEME:v1.0.5

bwa stat of bam:
338883556 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
448166 + 0 supplementary
0 + 0 duplicates
330144737 + 0 mapped (97.42% : N/A)
338435390 + 0 paired in sequencing
169217695 + 0 read1
169217695 + 0 read2
322879394 + 0 properly paired (95.40% : N/A)
329460362 + 0 with itself and mate mapped
236209 + 0 singletons (0.07% : N/A)
5641738 + 0 with mate mapped to a different chr
2394586 + 0 with mate mapped to a different chr (mapQ>=5)
BWA-MEME stat of bam:
338883548 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
448158 + 0 supplementary
0 + 0 duplicates
330144743 + 0 mapped (97.42% : N/A)
338435390 + 0 paired in sequencing
169217695 + 0 read1
169217695 + 0 read2
322879718 + 0 properly paired (95.40% : N/A)
329460388 + 0 with itself and mate mapped
236197 + 0 singletons (0.07% : N/A)
5641548 + 0 with mate mapped to a different chr
2394572 + 0 with mate mapped to a different chr (mapQ>=5)

@quito418
Copy link
Collaborator

quito418 commented Jan 31, 2024

Hi yukaiquan,

Thank you for trying out and reporting the issue.

There are randomness within BWA, BWA-MEM2, BWA-MEME due to chunk size that changes according to the number of threads. bwa 1 bwa mem2 bwa mem2
e.g., chunk (batch) statistics are used for paired mapping

Have you tried comparing the output using a fixed chunk size?

  • you can set a fixed chunk size using -K option
# Perform alignment with BWA-MEME, add -7 option
bwa-meme mem -7 -Y -K 100000000 -t <num_threads> <input.fasta> <input_1.fastq> -o <output_meme.sam>

# Below runs alignment with BWA-MEM2, without -7 option
bwa-meme mem -Y -K 100000000 -t <num_threads> <input.fasta> <input_1.fastq> -o <output_mem2.sam>

# Compare output SAM files
diff <output_mem2.sam> <output_meme.sam>

# To diff large SAM files use https://github.com/unhammer/diff-large-files

Thanks!

@yukaiquan
Copy link
Author

Hi quito418:
Thank you very much for your patient explanation, the results are consistent after adding -K.

Can the index be loaded only once when comparing thousands of samples in batches? Reading the index takes a lot of time.

Thanks!

@quito418
Copy link
Collaborator

quito418 commented Feb 1, 2024

Glad to hear it worked :)

At the moment, we have not developed a method for loading index once and reusing the loaded index.

Below are my suggestions that can be applied now:

  1. use linux disk cache (when you read/write file the file is cached in the ram at default). Hence if you run BWA-MEME sequentially in a same linux machine, the next time index is read, it will be loaded from the memory (which is 3-5 GB/sec in IO speed)
  2. use RAM disk. e.g., you may put the index files in the /dev/shm (~40GB for indexes required at runtime). This is similar to first method.

@yukaiquan
Copy link
Author

Thanks!
Best wishes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants