Tag Archives: databases

A world map of High- Throughput Sequencers.

Just a quick post to report a small but really interesting project I found sifting through the Internet. Since a few years, the web is filling up with maps of different types. In the semantics of contemporary web language, maps represent the most chosen way to describe global- sized phenomena, and the development of new interactive and customizable maps software is enhancing up this trend. I would say that this “map mania” is affecting genomics too, but it wouldn’t make much sense, since genetics and genomics were already really familiar with maps years before any web trend. Anyways, if you already could find a map of the best places to pick up a girl in the world, or a map indicating the most paid job in any single US State, now you can also check the world wide distribution of NGS technologies. Omicsmaps.com shows a very detailed world map of High- throughput Sequencers. You can search through the institutes hosting an NGS by sequencer category (5, HiSeq, Illumina GA2…), jump directly to a chosen country and report the ones you know and are missing.

Just a couple of updates. I am writing my thesis, so this blog will slow down till winter holidays. I really would like to mention how looks depressing to me the NGS map of Italy, but I will save you from my complaints. Anyways, I am working on a post to explain the situation of research and knowledge politics in Italy. I am also working on the podcast and it will be ready in a couple of weeks. Stay tuned. Or better: PLEASE, try to stay as tuned as you can.

Dealing with Uniprot- Python programming interface.

My thesis is mostly focused on proteins, and be sure that I got really familiar with Uniprot. Uniprot comes out with a strong user interface that ease any approach. Biochemists can easily find what they need as a bioinformatician can set up his scripts to obtain information directly from the database. In fact, Uniprot has a very good programming interface, compatible with all the main programming languages, and very well explained on a detailed official tutorial. To find it, you can click this link, or you can search in google with the query “uniprot programmatically”. To me, it’s been quite complicated to find this page browsing in the website, since Uniprot documentation is huge (and I have no patience for this).

For instance, I have to retrieve a brunch of Helix Turn Helix transcriptional factors in fasta format. I’ve got a text file with one ID per line and must save them as Seq objects from the Bio.SeqIO Biopython module. Quite easy indeed, all the game is on the following function:

import urllib,urllib2

def getseq(ID, extention):
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    return response.read()

Please, consider that I still can’t figure out how to display ‘t’ tabs in wordpress,  change the spacing if you’ll ever use this. The response.read() returning value can be managed as a string. One can iterate this in order to print everything on a text file to be parsed with the SeqIO.parse() method. As shown, importing both urllib and urllib2 is mandatory. The amazing thing of uniprot is that the programmatic acces to the website is facilitated by the very simple organization of the database. If you know the ID, you just have to add the file type you need and build a web address with the filetype as extention.

The 6 books you must consider for your very first steps into databases.

Basically one would really avoid this. Biology is already quite hard to keep in mind, and you don’t really need of informatics to keep your brain busy. But, by definition, a bioinformatician is someone who match the two subjects and the best thing to do is to do it the best you can. Learning a powerful programming language such as Python, Ruby or Perl, getting used with markup- languages (XML and derivates) and learning databases are the three things a biologist must do to call himself bioinformatician.

Yes, ok, but how to do it? The standard procedure a computer scientist would enthusiastically prospect you is to find all the information for free and on the web. And he would be right. After all, the hacker philosophy is pretty clear about that: take all the free information you can find, even if this can be quite hard. But, since many of us are romantically devoted to books, and since not everybody is willing to spend time in a fight to the death against information entropy, sometimes release few money for a book it is not that bad idea.

Here are reported some books that can be useful to deepen into the world of databases. Building a database is a really boring thing you might really need. Handling big information is needed for the majority of genomics and structural studies. Building a database, beeing able to build querys or developing scripts to reason with the major genomic or protein databases out there could be really useful and time- saving.

Introduction to Database and Knowledge-base Systems
By S. Krishna

With this book you’ll learn the basics of database theory. Very easy to read and exhaustive.

SQL in a Nutshell A Desktop Quick Reference
By K. Klein

Maybe the best guide to understand Structured Query Language.

SAMS Teach Yourself SQL in 10 Minutes
By B. Forta

A concise guide to SQL. The text is organized into lessons. Easy to read and exhaustive.

MySQL Cookbook
By P. DuBois

900 pages to have a very complete overview on the open source DBMS MySQL. Maybe the best MySQL book around.

Instant PostgreSQL Starter
By D. K. Lyons

Move your first steps in the world of the open source DBMS PostgreSQL.

Database Annotation in Molecular Biology: Principles and Practice
By A. M. Lesk

A very exhaustive guide to Biological databases. Useful for database curators and users.

If you know more, please comment this.