Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record the number of families in each release in a standard way #108

Open
AntonPetrov opened this issue Jan 31, 2022 · 1 comment
Open
Assignees

Comments

@AntonPetrov
Copy link
Member

AntonPetrov commented Jan 31, 2022

  • Update section 5 of the Rfam FTP README file with historic info for releases 14.1-14.7. To get the old info one can parse the families table in mysql dumps for each release. Note: I've also added Update section 5 of FTP Readme to the 14.8 release checklist.
  • Consider creating an easily parseable file (tsv, csv) with release stats
  • Bonus points: include number of sequences, alignment size, mean percent identity (PID), Min PID, number of basepairs, C+G content, covariation, presence of pseudoknots. Many of these metrics can be obtained by running esl-alistat on the seed files.

For more background and to cite a tweet when done: https://twitter.com/YannPonty/status/1488136823771705344

@emmaco
Copy link
Contributor

emmaco commented Aug 11, 2022

File now available with version, date and number of families https://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/release_stats.tsv
To be updated each release.
Hopefully can expand to include other useful info, as suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants