From 4bd8c87b408d11bcf6394ec09af46832be20dadf Mon Sep 17 00:00:00 2001
From: peterjc
Date: Fri, 29 Apr 2016 16:11:54 +0100
Subject: [PATCH] Print functions and back-tick markup for AlignIO page etc
See #47.
---
wiki/AlignIO.md | 157 ++++++++++++++++++++++++------------------------
1 file changed, 80 insertions(+), 77 deletions(-)
diff --git a/wiki/AlignIO.md b/wiki/AlignIO.md
index 009fd9c96..e510e28b9 100644
--- a/wiki/AlignIO.md
+++ b/wiki/AlignIO.md
@@ -6,15 +6,15 @@ tags:
- Wiki Documentation
---
-This page describes Bio.AlignIO, a new multiple sequence Alignment
+This page describes `Bio.AlignIO`, a new multiple sequence Alignment
Input/Output interface for BioPython 1.46 and later.
In addition to the built in API documentation, there is a whole chapter
in the [Tutorial](http://biopython.org/DIST/docs/tutorial/Tutorial.html)
on Bio.AlignIO, and although there is some overlap it is well worth
-reading in addition to this WIKI page. There is also the [API
+reading in addition to this page. There is also the [API
documentation](http://biopython.org/DIST/docs/api/Bio.AlignIO-module.html)
-(which you can read online, or from within Python with the help
+(which you can read online, or from within Python with the `help()`
command).
Aims
@@ -23,21 +23,21 @@ Aims
You may already be familiar with the [Bio.SeqIO](SeqIO "wikilink")
module which deals with files containing one or more sequences
represented as [SeqRecord](SeqRecord "wikilink") objects. The purpose of
-the SeqIO module is to provide a simple uniform interface to assorted
+the `SeqIO` module is to provide a simple uniform interface to assorted
sequence file formats.
-Similarly, Bio.AlignIO deals with files containing one or more sequence
-alignments represented as Alignment objects. Bio.AlignIO uses the same
-set of functions for input and output as in Bio.SeqIO, and the same
+Similarly, `Bio.AlignIO` deals with files containing one or more sequence
+alignments represented as Alignment objects. `Bio.AlignIO` uses the same
+set of functions for input and output as in `Bio.SeqIO`, and the same
names for the file formats supported.
-Note that the inclusion of Bio.AlignIO does lead to some duplication or
-choice in how to deal with some file formats. For example, Bio.AlignIO
-and Bio.Nexus will both read alignments from NEXUS files - but Bio.NEXUS
-allows more control and the use of trees.
+Note that the inclusion of `Bio.AlignIO` does lead to some duplication or
+choice in how to deal with some file formats. For example, `Bio.AlignIO`
+and `Bio.Nexus` will both read alignments from NEXUS files - but
+`Bio.NEXUS` allows more control and the use of trees.
My vision is that for reading or writing sequence alignments you should
-try Bio.AlignIO as your first choice. In some cases you may only care
+try `Bio.AlignIO` as your first choice. In some cases you may only care
about the sequences themselves, in which case try using
[Bio.SeqIO](SeqIO "wikilink") on the alignment file directly. Unless you
have some very specific requirements, I hope this should suffice.
@@ -98,48 +98,50 @@ Fib\_gamma](http://pfam.sanger.ac.uk/family?acc=PF09395). At the time of
writing, this contained 14 sequences with an alignment length of 77
amino acids, and is shown below in the PFAM or Stockholm format:
- # STOCKHOLM 1.0
- #=GS Q7ZVG7_BRARE/37-110 AC Q7ZVG7.1
- #=GS Q6X871_SCAAQ/1-77 AC Q6X871.1
- #=GS O02676_CROCR/1-77 AC O02676.1
- #=GS Q6X869_TENEC/1-77 AC Q6X869.1
- #=GS FIBG_HUMAN/40-116 AC P02679.3
- #=GS O02689_TAPIN/1-77 AC O02689.1
- #=GS O02688_PIG/1-77 AC O02688.1
- #=GS O02672_9CETA/1-77 AC O02672.1
- #=GS O02682_EQUPR/1-77 AC O02682.1
- #=GS Q6X870_CYNVO/1-77 AC Q6X870.1
- #=GS FIBG_RAT/40-116 AC P02680.3
- #=GS Q6X866_DROAU/1-76 AC Q6X866.1
- #=GS O93568_CHICK/40-116 AC O93568.1
- #=GS FIBG_XENLA/38-114 AC P17634.1
- Q7ZVG7_BRARE/37-110 GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML
- Q6X871_SCAAQ/1-77 RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM
- O02676_CROCR/1-77 RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM
- Q6X869_TENEC/1-77 RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML
- FIBG_HUMAN/40-116 RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML
- #=GS FIBG_HUMAN/40-116 DR PDB; 1qvh L;14-45
- #=GS FIBG_HUMAN/40-116 DR PDB; 1fza C;88-90
- #=GS FIBG_HUMAN/40-116 DR PDB; 1fzb C;88-90
- #=GS FIBG_HUMAN/40-116 DR PDB; 1fzb F;88-90
- #=GS FIBG_HUMAN/40-116 DR PDB; 1qvh I;14-45
- #=GS FIBG_HUMAN/40-116 DR PDB; 1fza F;88-90
- #=GR FIBG_HUMAN/40-116 SS CCXCXBXXHHHHHHHHHHHHHHHHHHHHHHHXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-CC
- O02689_TAPIN/1-77 RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML
- O02688_PIG/1-77 RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML
- O02672_9CETA/1-77 RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM
- O02682_EQUPR/1-77 RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM
- Q6X870_CYNVO/1-77 RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV
- FIBG_RAT/40-116 RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV
- Q6X866_DROAU/1-76 RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI
- O93568_CHICK/40-116 RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII
- #=GS O93568_CHICK/40-116 DR PDB; 1m1j F;14-90
- #=GS O93568_CHICK/40-116 DR PDB; 1m1j C;14-90
- #=GR O93568_CHICK/40-116 SS CCEEEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHHH
- FIBG_XENLA/38-114 RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW
- #=GC SS_cons CCECEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHCC
- #=GC seq_cons RFGSYCPTTCGIADFLSsYQssVDcDLQsLEsILpplEN+ToEAc-LIKuIQlsYsP--ss+PstI-uATpcSKKMl
- //
+```
+# STOCKHOLM 1.0
+#=GS Q7ZVG7_BRARE/37-110 AC Q7ZVG7.1
+#=GS Q6X871_SCAAQ/1-77 AC Q6X871.1
+#=GS O02676_CROCR/1-77 AC O02676.1
+#=GS Q6X869_TENEC/1-77 AC Q6X869.1
+#=GS FIBG_HUMAN/40-116 AC P02679.3
+#=GS O02689_TAPIN/1-77 AC O02689.1
+#=GS O02688_PIG/1-77 AC O02688.1
+#=GS O02672_9CETA/1-77 AC O02672.1
+#=GS O02682_EQUPR/1-77 AC O02682.1
+#=GS Q6X870_CYNVO/1-77 AC Q6X870.1
+#=GS FIBG_RAT/40-116 AC P02680.3
+#=GS Q6X866_DROAU/1-76 AC Q6X866.1
+#=GS O93568_CHICK/40-116 AC O93568.1
+#=GS FIBG_XENLA/38-114 AC P17634.1
+Q7ZVG7_BRARE/37-110 GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML
+Q6X871_SCAAQ/1-77 RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM
+O02676_CROCR/1-77 RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM
+Q6X869_TENEC/1-77 RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML
+FIBG_HUMAN/40-116 RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML
+#=GS FIBG_HUMAN/40-116 DR PDB; 1qvh L;14-45
+#=GS FIBG_HUMAN/40-116 DR PDB; 1fza C;88-90
+#=GS FIBG_HUMAN/40-116 DR PDB; 1fzb C;88-90
+#=GS FIBG_HUMAN/40-116 DR PDB; 1fzb F;88-90
+#=GS FIBG_HUMAN/40-116 DR PDB; 1qvh I;14-45
+#=GS FIBG_HUMAN/40-116 DR PDB; 1fza F;88-90
+#=GR FIBG_HUMAN/40-116 SS CCXCXBXXHHHHHHHHHHHHHHHHHHHHHHHXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-CC
+O02689_TAPIN/1-77 RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML
+O02688_PIG/1-77 RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML
+O02672_9CETA/1-77 RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM
+O02682_EQUPR/1-77 RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM
+Q6X870_CYNVO/1-77 RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV
+FIBG_RAT/40-116 RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV
+Q6X866_DROAU/1-76 RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI
+O93568_CHICK/40-116 RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII
+#=GS O93568_CHICK/40-116 DR PDB; 1m1j F;14-90
+#=GS O93568_CHICK/40-116 DR PDB; 1m1j C;14-90
+#=GR O93568_CHICK/40-116 SS CCEEEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHHH
+FIBG_XENLA/38-114 RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW
+#=GC SS_cons CCECEEE-CCCCCCCCCCCCCHHHCCCCCHHHHHHHHHHHHHHHCCCCCCHHHHS-SSTT--SS-HHHHHHHHHHCC
+#=GC seq_cons RFGSYCPTTCGIADFLSsYQssVDcDLQsLEsILpplEN+ToEAc-LIKuIQlsYsP--ss+PstI-uATpcSKKMl
+//
+```
You will notice that there is plenty of annotation information here,
including accession numbers for each sequence and also some PDB database
@@ -149,53 +151,54 @@ chick fibrinogen proteins.
This file contains a single alignment, so we can use the
`Bio.AlignIO.read()` function to load it in Biopython. Let's assume
you have downloaded this alignment from Sanger, or have copy and pasted
-the text above, and saved this as a file called `PF09395\_seed.sth` on
+the text above, and saved this as a file called `PF09395_seed.sth` on
your computer. Then in python:
``` python
from Bio import AlignIO
alignment = AlignIO.read(open("PF09395_seed.sth"), "stockholm")
-print "Alignment length %i" % alignment.get_alignment_length()
+print("Alignment length %i" % alignment.get_alignment_length())
for record in alignment :
- print record.seq, record.id
+ print(record.seq + " " + record.id)
```
That should give:
- Alignment length 77
- GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML Q7ZVG7_BRARE/37-110
- RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM Q6X871_SCAAQ/1-77
- RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM O02676_CROCR/1-77
- RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML Q6X869_TENEC/1-77
- RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML FIBG_HUMAN/40-116
- RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML O02689_TAPIN/1-77
- RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML O02688_PIG/1-77
- RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM O02672_9CETA/1-77
- RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM O02682_EQUPR/1-77
- RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV Q6X870_CYNVO/1-77
- RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV FIBG_RAT/40-116
- RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI Q6X866_DROAU/1-76
- RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII O93568_CHICK/40-116
- RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW FIBG_XENLA/38-114
+```
+Alignment length 77
+GFGTYCPTTCGVADYLQRYKPDMDKKLDDMEQDLEEIANLTRGAQDKVVYLK---DSEAQAQKQSPDTYIKKSSNML Q7ZVG7_BRARE/37-110
+RFGSYCPTTCGIADFLSTYQATVDKDLQTLEDILSQAENKTMEAKELVKAIQVSYLPEDPARPNRVELATKDSKKMM Q6X871_SCAAQ/1-77
+RFGSYCPTTCGIADFLSTYQTGVXNDLRTLEDLLSGIENKTSEAKELIKSIQVSYNPNEPPKPNTIVSATKDSKKMM O02676_CROCR/1-77
+RFGSYCPTTCGIADFLSTYQGSIDKDLQTLEDILNQVENKTXEASELIKSIQVSYNPDEPPRPNMIEGATQKSKKML Q6X869_TENEC/1-77
+RFGSYCPTTCGIADFLSTYQTKVDKDLQSLEDILHQVENKTSEVKQLIKAIQLTYNPDESSKPNMIDAATLKSRKML FIBG_HUMAN/40-116
+RFGSYCPTTCGIADFLSTYQTXVDKDLQVLEDILNQAENKTSEAKELIKAIQVRYKPDEPTKPGGIDSATRESKKML O02689_TAPIN/1-77
+RFGSYCPTMCGIAGFLSTYQNTVEKDLQNLEGILHQVENKTSEARELIKAIQISYNPEDLSKPDRIQSATKESKKML O02688_PIG/1-77
+RFGSYCPTTCGVADFLSNYQTSVDKDLQNLEGILYQVENKTSEARELVKAIQISYNPDEPSKPNNIESATKNSKRMM O02672_9CETA/1-77
+RFGSYCPTTCGIADFLSNYQTSVDKDLQDFEDILHRAENQTSEAEQLIQAIRTSYNPDEPPKTGRIDAATRESKKMM O02682_EQUPR/1-77
+RFGSYCPTTCGIADFLSTYQTKVDEDLQNLEDILYRVENRTSEAKELIKAIQVDYNPGEPPKQSVTEGATQNAKKMV Q6X870_CYNVO/1-77
+RFGSYCPTTCGISDFLNSYQTDVDTDLQTLENILQRAENRTTEAKELIKAIQVYYNPDQPPKPGMIEGATQKSKKMV FIBG_RAT/40-116
+RFGSYCPTTCGIADFLNKYQTTIDQDLRHMEETLRDIDNKTAESTLLIQKIQIGQTPDPRPQ-NVIGDVTQKSRKMI Q6X866_DROAU/1-76
+RFGSYCPTTCGIADFFNKYRLTTDGELLEIEGLLQQATNSTGSIEYLIQHIKTIYPSEKQTLPQSIEQLTQKSKKII O93568_CHICK/40-116
+RFGEYCPTTCGISDFLNRYQENVDTDLQYLENLLTQISNSTSGTTIIVEHLIDSGKKPATSPQTAIDPMTQKSKTCW FIBG_XENLA/38-114
+```
Alignment Output
----------------
As in [Bio.SeqIO](SeqIO "wikilink"), there is a single output function
-**Bio.AlignIO.write()**. This takes three arguments: some alignments, a
+`Bio.AlignIO.write()`. This takes three arguments: some alignments, a
file handle to write to, and the format to use.
-As of Biopython 1.48, the alignment object acquired a **format()**
+As of Biopython 1.48, the alignment object acquired a `.format()`
method to give a string containing the alignment in the specified file
format, e.g.
``` python
AlignIO.read(open("PF09395_seed.sth"), "stockholm")
-print alignment.format("fasta")
+print(alignment.format("fasta"))
```
-This wiki section needs to be filled out, so in the short term please
-refer to the Bio.AlignIO chapter in the Tutorial.
+Please refer to the Bio.AlignIO chapter in the Tutorial for more details.
File Format Conversion
----------------------