Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of unique reads fastq mode #20

Open
lisagrigoreva opened this issue Nov 2, 2022 · 3 comments
Open

Number of unique reads fastq mode #20

lisagrigoreva opened this issue Nov 2, 2022 · 3 comments

Comments

@lisagrigoreva
Copy link

Hi,
I was worried about what exactly represents the output of 'Number of unique reads' in collapsing reads from fastq?
Number of input read 6609696
Number of unique reads 3885326
Number of reads after deduplicating 3028828

Because it seems like the number of unique reads should be similar with number of reads after deduplicating

@Daniel-Liu-c0deb0t
Copy link
Owner

Unique reads in this case represents the number of reads with UMIs that are not exactly identical to any other read's UMI. This does not account for errors in the UMIs, which is why the count is greater than the number of reads after deduplicating. The deduplication process allows similar (but not exactly identical) UMIs to be grouped together.

@lisagrigoreva
Copy link
Author

Thank you! Is it possible somehow to get reads with identical UMIs ? I suppose putting p=1?

@Daniel-Liu-c0deb0t
Copy link
Owner

If you want to only deduplicate reads if they have the exact same UMI, you should pass in -k 0 to indicate that zero errors are tolerated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants