Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve taxonomy rank indexing #805

Merged
merged 5 commits into from
Mar 19, 2020
Merged

improve taxonomy rank indexing #805

merged 5 commits into from
Mar 19, 2020

Conversation

trvinh
Copy link
Contributor

@trvinh trvinh commented Mar 14, 2020

Description

The previous rank indexing function did not work correctly while sorting the noranks (undefined ranks). This has been fixed now.

Related Issue

Example

@trvinh
Copy link
Contributor Author

trvinh commented Mar 14, 2020

Hi @sckott
sorry, but I don't know how to solve the issue with vcr. I tested the function class2tree using the test file test-class2tree.R, it worked fine by commenting out the vcr::use_cassette function.

@sckott sckott added this to the v0.9.93 milestone Mar 16, 2020
Copy link
Contributor

@sckott sckott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain the problem in more detail? I'm not sure what problem this is solving.

For the vcr thing, delete the cassette file class2tree_classification_call.yml, from the command line e.g, like rm tests/fixtures/class2tree_classification_call.yml or just in rstudio or however. Then re-run tests again, and it should work. (p.s. we're thinking about how to make this easier for contributors to packages that use vcr in tests - ropensci/vcr#145)

R/class2tree.R Outdated Show resolved Hide resolved
R/class2tree.R Show resolved Hide resolved
@trvinh
Copy link
Contributor Author

trvinh commented Mar 17, 2020

@sckott removing the cassette files solved the issue, the test for class2tree passed. But the travis-ci has another error: The command "sudo add-apt-repository -y "ppa:marutter/rrutter3.5"" failed and exited with 1 during . . I think it has nothing to do with the taxize package, right?

@sckott
Copy link
Contributor

sckott commented Mar 17, 2020

thanks for fixing cassettes

for the travis failure, i restarted it, its a server system dependency problem - probably intermittent

please answer the above too: Can you please explain the problem in more detail? I'm not sure what problem this is solving.

@trvinh
Copy link
Contributor Author

trvinh commented Mar 18, 2020

I will use this small example to reproduce the issue

library("data.table")
library("ape")
library("taxize")
library("zoo")

spnames <- c("Streptomyces coelicolor", "Mus musculus", "Rattus norvegicus")
input <- classification(spnames, db='ncbi')

# prepare rank list
rankList <- dt2df(lapply(input, get_rank), idcol = FALSE)
# index all ranks of input species
rank_indexing(rankList)

This is the result:

# using the OLD function
> rank_indexing(rankList)
   index           rank
1      5        species
2      6  species group
3      8       subgenus
4      9          genus
5     12      subfamily
6     13         family
8     14  norank_337687
10    17       suborder
7     18          order
13    19  norank_314147
15    20     superorder
17    21 norank_1437010
18    22    norank_9347
19    23   norank_32525
9     24          class
20    25   norank_32524
12    26 norank_1783272
21    26   norank_32523
22    27 norank_1338369
23    28     superclass
24    29  norank_117571
25    30  norank_117570
26    31    norank_7776
27    32    norank_7742
28    33      subphylum
11    34         phylum
29    35   norank_33511
30    36   norank_33213
31    37    norank_6072
32    38        kingdom
33    39   norank_33154
14    40   superkingdom
16    41  norank_131567

# using the NEW function
> rank_indexing(rankList)
   index           rank
1      5        species
2      6  species group
3      8       subgenus
4      9          genus
5     12      subfamily
6     13         family
8     14  norank_337687
10    17       suborder
7     18          order
13    19  norank_314147
15    20     superorder
17    21 norank_1437010
18    22    norank_9347
19    23   norank_32525
9     24          class
20    25   norank_32524
21    26   norank_32523
22    27 norank_1338369
23    28     superclass
24    29  norank_117571
25    30  norank_117570
26    31    norank_7776
27    32    norank_7742
28    33      subphylum
11    34         phylum
12    35 norank_1783272
29    35   norank_33511
30    36   norank_33213
31    37    norank_6072
32    38        kingdom
33    39   norank_33154
14    40   superkingdom
16    41  norank_131567

So, please notice the position of the norank_1783272. According to the order of the NCBI taxonomy ranks for the first species, this rank must be placed between phylum and superkingdom:

> input[1]
$`Streptomyces coelicolor`
                              name          rank      id
1               cellular organisms       no rank  131567
2                         Bacteria  superkingdom       2
3              Terrabacteria group       no rank 1783272
4                   Actinobacteria        phylum  201174
5                   Actinobacteria         class    1760
6                 Streptomycetales         order   85011
7                Streptomycetaceae        family    2062
8                     Streptomyces         genus    1883
9  Streptomyces albidoflavus group species group 1477431
10         Streptomyces coelicolor       species    1902

However, the old rank_indexing places it way before phylum (pos 26 vs 34). With the new function, norank_1783272 (pos 35) can be placed correctly between phylum and superkingdom (pos 34 and 40).

P.S.: the changes in other functions (get_rank, get_name and taxonomy_table_creator) are just code formatting :)

P.P.S.: the travis-ci still failed because of other cassettes, should I also removed them and run the test agains for the whole package?

@sckott
Copy link
Contributor

sckott commented Mar 19, 2020

  • okay, thanks for the explanation of the issue. makes sense now.
  • okay on formatting changes
  • don't worry about tests for other fxns

Looks good to me, thanks for the fix!

@sckott sckott merged commit 3c28e08 into ropensci:master Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants