Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading labels in species sunburst #7

Open
AntonPetrov opened this issue Jan 25, 2017 · 0 comments
Open

Misleading labels in species sunburst #7

AntonPetrov opened this issue Jan 25, 2017 · 0 comments
Assignees
Milestone

Comments

@AntonPetrov
Copy link
Member

Example

Species sunburst for Clostridia in RF01315 shows that there are 64 sequences:

screen shot 2016-02-19 at 17 15 47

An example SQL query confirming the number of sequences:

SELECT CONCAT(t1.rfamseq_acc, '/', seq_start, '-', `seq_end`)
FROM full_region t1, rfamseq t2, taxonomy t3
WHERE t1.rfam_acc = 'RF01315' 
AND t1.rfamseq_acc = t2.rfamseq_acc
AND t2.ncbi_id = t3.ncbi_id
AND t3.tax_string LIKE '%Clostridia;%'
AND is_significant = 1
GROUP BY rfamseq_acc;

64 rows (like in sunburst UI) - note the GROUP BY clause

However, there are many more annotated regions:

SELECT CONCAT(t1.rfamseq_acc, '/', seq_start, '-', `seq_end`)
FROM full_region t1, rfamseq t2, taxonomy t3
WHERE t1.rfam_acc = 'RF01315' 
AND t1.rfamseq_acc = t2.rfamseq_acc
AND t2.ncbi_id = t3.ncbi_id
AND t3.tax_string LIKE '%Clostridia;%'
AND is_significant = 1;

6222 rows - no GROUP BY clause

So the number of entries in the resulting FASTA file is inconsistent with sunburst UI.

@AntonPetrov AntonPetrov self-assigned this Aug 2, 2017
@AntonPetrov AntonPetrov added this to the Rfam 13.1 milestone Sep 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant