Skip to content

Commit

Permalink
Print functions and back-tick markup for SeqRecord page
Browse files Browse the repository at this point in the history
  • Loading branch information
peterjc committed Apr 29, 2016
1 parent 169db6a commit ceef80e
Showing 1 changed file with 42 additions and 42 deletions.
84 changes: 42 additions & 42 deletions wiki/SeqRecord.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ tags:
- Wiki Documentation
---

This page describes the **SeqRecord** object used in BioPython to hold a
This page describes the `SeqRecord` object used in BioPython to hold a
sequence (as a [Seq](Seq "wikilink") object) with identifiers (ID and
name), description and optionally annotation and sub-features.

Most of the sequence file format parsers in BioPython can return
**SeqRecord** objects (and may offer a format specific record object
`SeqRecord` objects (and may offer a format specific record object
too, see for example Bio.SwissProt). The [SeqIO](SeqIO "wikilink")
system will *only* return SeqRecord objects.

In addition to the **SeqRecord** object's [API
In addition to the `SeqRecord` object's [API
documentation](http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html),
there is a whole chapter in the
[Tutorial](http://biopython.org/DIST/docs/tutorial/Tutorial.html)
Expand All @@ -25,17 +25,17 @@ there is a whole chapter in the
Extracting information from a SeqRecord
---------------------------------------

Lets look in detail at the well annotated **SeqRecord** objects
Lets look in detail at the well annotated `SeqRecord` objects
Biopython creates from a GenBank file, such as
[`ls_orchid.gbk`](https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/ls_orchid.gbk),
which we'll load using the [SeqIO](SeqIO "wikilink") module. This file
contains 94 records:

``` python
from Bio import SeqIO
for index, record in enumerate(SeqIO.parse(open("ls_orchid.gbk"), "genbank")):
print "index %i, ID = %s, length %i, with %i features" \
% (index, record.id, len(record.seq), len(record.features))
for index, record in enumerate(SeqIO.parse("ls_orchid.gbk", "genbank")):
print("index %i, ID = %s, length %i, with %i features"
% (index, record.id, len(record.seq), len(record.features)))
```

And this is some of the output. Remember python likes to count from
Expand All @@ -53,7 +53,7 @@ index 93, ID = Z78439.1, length 592, with 5 features
Lets look in a little more detail at the final record:

``` python
print record
print(record)
```

That should give you a hint of the sort of information held in this
Expand All @@ -75,23 +75,23 @@ Number of features: 5
Seq('CATTGTTGAGATCACATAATAATTGATCGAGTTAATCTGGAGGATCTGTTTACTTTGGTC ...', IUPACAmbiguousDNA())
```

Lets look a little more closely... and use python's **dir()** function
Lets look a little more closely... and use python's `dir()` function
to find out more about the SeqRecord object and what it does:

``` python
>>> dir(record)
[..., 'annotations', 'dbxrefs', 'description', 'features', 'format', 'id', 'letter_annotations', 'name', 'seq']
```

If you didn't already know, the **dir()** function returns a list of all
If you didn't already know, the `dir()` function returns a list of all
the methods and properties of an object (as strings). Those starting
with underscores in their name are "special" and we'll be ignoring them
in this discussion. We'll start with the **seq** property:
in this discussion. We'll start with the `.seq` property:

``` python
>>> print record.seq
>>> print(record.seq)
Seq('CATTGTTGAGATCACATAATAATTGATCGAGTTAATCTGGAGGATCTGTTTACTTTGGTC ...', IUPACAmbiguousDNA())
>>> print record.seq.__class__
>>> print(record.seq.__class__)
Bio.Seq.Seq
```

Expand All @@ -101,74 +101,74 @@ Biopython, and worth of its own page on the wiki documentation.
The following three properties are all simple strings:

``` python
>>> print record.id
>>> print(record.id)
Z78439.1
>>> print record.name
>>> print(record.name)
Z78439
>>> print record.description
>>> print(record.description)
P.barbatum 5.8S rRNA gene and ITS1 and ITS2 DNA.
```

Have a look at the raw GenBank file to see where these came from.

Next, we'll check the **dxrefs** property, which holds any database
Next, we'll check the `.dxrefs` property, which holds any database
cross references:

``` python
>>> print record.dbxrefs
>>> print(record.dbxrefs)
[]
>>> print record.dbxrefs.__class__
>>> print(record.dbxrefs.__class__)
<type 'list'>
```

An empty list? Disappointing... if we'd used a more recent GenBank file
the genome sequencing project reference would show up here.

How about the **annotations** property? This is a python dictionary...
How about the `.annotations` property? This is a python dictionary...

``` python
>>> print record.annotations
>>> print(record.annotations)
{'source': 'Paphiopedilum barbatum', 'taxonomy': ...}
>>> print record.annotations.__class__
>>> print(record.annotations.__class__)
<type 'dict'>
>>> print record.annotations["source"]
>>> print(record.annotations["source"])
Paphiopedilum barbatum
```

In this case, most of the values in the dictionary are simple strings,
but this isn't always the case - have a look at the references entry for
this example - its a list of **Reference** objects:
this example - its a list of `Reference` objects:

``` python
>>> print record.annotations["references"].__class__
>>> print(record.annotations["references"].__class__)
<type 'list'>
>>> print len(record.annotations["references"])
>>> print(len(record.annotations["references"]))
2
>>> for ref in record.annotations["references"]: print ref.authors
>>> for ref in record.annotations["references"]: print(ref.authors)
Cox,A.V., Pridgeon,A.M., Albert,V.A. and Chase,M.W.
Cox,A.V.
```

Next is **features** which is another list property, and it contains
SeqFeature objects:
Next is `.features` which is another list property, and it contains
`SeqFeature` objects:

``` python
>>> print record.features.__class__
>>> print(record.features.__class__)
<type 'list'>
>>> print len(record.features)
>>> print(len(record.features))
5
```

SeqFeature objects are complicated enough to warrant their own wiki
page... for now please refer to the Tutorial.

If you are using Biopython 1.48 or later, there will be a **format**
method. This lets you convert the **SeqRecord** into a string using one
of the output formats supported by [Bio.SeqIO](SeqIO "wikilink"), for
If you are using Biopython 1.48 or later, there will be a `.format()`
method. This lets you convert the `SeqRecord` into a string using one
of the output formats supported by [`Bio.SeqIO`](SeqIO "wikilink"), for
example:

``` python
>>> print record.format("fasta")
>>> print(record.format("fasta"))
```

This should give:
Expand All @@ -188,13 +188,13 @@ TGCTACAACAAAATTGTTGTGCAAATGCCCCGGTTGGCCGTTTAGTTGGGCC
```

If you are using Biopython 1.50 or later, there will also be a
**letter\_annotations** property. Again this is a dictionary but for
`.letter_annotations` property. Again this is a dictionary but for
per-letter-annotation such as sequence quality scores or secondary
structure predictions. This kind of information isn't found in GenBank
files, so in this case the dictionary is empty:

``` python
>>> print record.letter_annotations
>>> print(record.letter_annotations)
{}
```

Expand All @@ -205,9 +205,9 @@ per-letter-annotation.
Creating a SeqRecord object
---------------------------

Most of the time you'll create **SeqRecord** objects by parsing a
sequence file with [Bio.SeqIO](SeqIO "wikilink"). However, it is useful
to know how to create a **SeqRecord** directly. For example,
Most of the time you'll create `SeqRecord` objects by parsing a
sequence file with [`Bio.SeqIO`](SeqIO "wikilink"). However, it is useful
to know how to create a `SeqRecord` directly. For example,

``` python
from Bio.Seq import Seq
Expand All @@ -217,7 +217,7 @@ record = SeqRecord(Seq("MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF",
IUPAC.protein),
id="YP_025292.1", name="HokC",
description="toxic membrane protein, small")
print record
print(record)
```

This would give the following output:
Expand All @@ -231,4 +231,4 @@ Seq('MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF', IUPACProtein())
```

You could then pass this new record to
[Bio.SeqIO.write(...)](SeqIO "wikilink") to save it to disk.
[`Bio.SeqIO.write(...)`](SeqIO "wikilink") to save it to disk.

0 comments on commit ceef80e

Please sign in to comment.