My thesis is mostly focused on proteins, and be sure that I got really familiar with Uniprot. Uniprot comes out with a strong user interface that ease any approach. Biochemists can easily find what they need as a bioinformatician can set up his scripts to obtain information directly from the database. In fact, Uniprot has a very good programming interface, compatible with all the main programming languages, and very well explained on a detailed official tutorial. To find it, you can click this link, or you can search in google with the query “uniprot programmatically”. To me, it’s been quite complicated to find this page browsing in the website, since Uniprot documentation is huge (and I have no patience for this).
For instance, I have to retrieve a brunch of Helix Turn Helix transcriptional factors in fasta format. I’ve got a text file with one ID per line and must save them as Seq objects from the Bio.SeqIO Biopython module. Quite easy indeed, all the game is on the following function:
def getseq(ID, extention):
req = urllib2.Request(url)
response = urllib2.urlopen(req)
Please, consider that I still can’t figure out how to display ‘t’ tabs in wordpress, change the spacing if you’ll ever use this. The response.read() returning value can be managed as a string. One can iterate this in order to print everything on a text file to be parsed with the SeqIO.parse() method. As shown, importing both urllib and urllib2 is mandatory. The amazing thing of uniprot is that the programmatic acces to the website is facilitated by the very simple organization of the database. If you know the ID, you just have to add the file type you need and build a web address with the filetype as extention.