My thesis is mostly focused on proteins, and be sure that I got really familiar with Uniprot. Uniprot comes out with a strong user interface that ease any approach. Biochemists can easily find what they need as a bioinformatician can set up his scripts to obtain information directly from the database. In fact, Uniprot has a very good programming interface, compatible with all the main programming languages, and very well explained on a detailed official tutorial. To find it, you can click this link, or you can search in google with the query “uniprot programmatically”. To me, it’s been quite complicated to find this page browsing in the website, since Uniprot documentation is huge (and I have no patience for this).

For instance, I have to retrieve a brunch of Helix Turn Helix transcriptional factors in fasta format. I’ve got a text file with one ID per line and must save them as Seq objects from the Bio.SeqIO Biopython module. Quite easy indeed, all the game is on the following function:

import urllib,urllib2

def getseq(ID, extention):
    base_url=”http://www.uniprot.org/uniprot/”
    url=base_url+ID+”.”+extention
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    return response.read()

Please, consider that I still can’t figure out how to display ‘t’ tabs in wordpress,  change the spacing if you’ll ever use this. The response.read() returning value can be managed as a string. One can iterate this in order to print everything on a text file to be parsed with the SeqIO.parse() method. As shown, importing both urllib and urllib2 is mandatory. The amazing thing of uniprot is that the programmatic acces to the website is facilitated by the very simple organization of the database. If you know the ID, you just have to add the file type you need and build a web address with the filetype as extention.

Advertisements

4 thoughts on “Dealing with Uniprot- Python programming interface.

  1. Thanks for your post. I understand that it is aimed at explaining what uniprot is and how do fetch the URL to retrieve the sequence.

    For your information, the uniprot web service can be accessed from BioServices (https://pypi.python.org/pypi/bioservices) and the Python code would be

    >>> from bioservices import UniProt
    >>> u = UniProt()
    >>> u.get_fasta_sequence(“P43403”)
    ‘MPDPAAHLPFF…..’

    Best
    Thomas

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s