A world map of High- Throughput Sequencers.

Just a quick post to report a small but really interesting project I found sifting through the Internet. Since a few years, the web is filling up with maps of different types. In the semantics of contemporary web language, maps represent the most chosen way to describe global- sized phenomena, and the development of new interactive and customizable maps software is enhancing up this trend. I would say that this “map mania” is affecting genomics too, but it wouldn’t make much sense, since genetics and genomics were already really familiar with maps years before any web trend. Anyways, if you already could find a map of the best places to pick up a girl in the world, or a map indicating the most paid job in any single US State, now you can also check the world wide distribution of NGS technologies. Omicsmaps.com shows a very detailed world map of High- throughput Sequencers. You can search through the institutes hosting an NGS by sequencer category (5, HiSeq, Illumina GA2…), jump directly to a chosen country and report the ones you know and are missing.

Just a couple of updates. I am writing my thesis, so this blog will slow down till winter holidays. I really would like to mention how looks depressing to me the NGS map of Italy, but I will save you from my complaints. Anyways, I am working on a post to explain the situation of research and knowledge politics in Italy. I am also working on the podcast and it will be ready in a couple of weeks. Stay tuned. Or better: PLEASE, try to stay as tuned as you can.


Illumina explains why patents are killing culture and research.

I should recover an old press release, dating to the 90s, that announced the intention of some japanese businessmen to patent pizza. I am not joking. Since no one thought it before, they considered the idea to patent pizza and claim the royalties to any single restaurant in the world. They have obviously failed after a rather zippy protest of Italian public opinion. The story I am gonna tell today looks quite similar, since the intent to patent biological cloud computing is not less insane. Illumina, the “large and in charge” biotechnology firm who brought mass sequencing to common use, claims the paternity of cloud bioinformatics, as you can see on this Google Patents entry. In a few words, the company affirms to have invented a way to collect and analyze biological data in the cloud. I think that Richard Holland hits the spot in his article on EagleGenomics, the definition of what they claim they have invented is really broad and it could include any kind of bionformatics- oriented application or method. I wonder if they will sue T-Coffee for letting the users to save the alignments results on Dropbox, or the major databases for including a data storage. How many future applications will be charged? How many project will fail for legal reasons?

The real problem with patents is mostly cultural and philosophical. Patents are killing culture, science and innovation. In a very typical neo- liberist mentality, control, exclusion and corporations privileges are the main way in making profits. The first aim is not investing in new ideas, but being the first and the only one in doing this. And there are two ways to do it: be faster or cause the others to be slower to the point to prevent them to run. And the latter, is always the easier choice. Research is thus no more a matter of innovation, ideas and hard work, but a question for lawyers and legal technicism. This is the transposition of the principles of financial speculation to Science.

And as happens to the economy, what we would really need is a democratic governance. A set of laws and practices to guarantee the public access to knowledge and scientific production preventing exclusionary policies. We must admit, for first, that Science needs democracy. As many people can access and modify the information, advances and innovation come faster, and everyone can easily quantify the advantages of this. Kinda obvious, to me.

Dealing with Uniprot- Python programming interface.

My thesis is mostly focused on proteins, and be sure that I got really familiar with Uniprot. Uniprot comes out with a strong user interface that ease any approach. Biochemists can easily find what they need as a bioinformatician can set up his scripts to obtain information directly from the database. In fact, Uniprot has a very good programming interface, compatible with all the main programming languages, and very well explained on a detailed official tutorial. To find it, you can click this link, or you can search in google with the query “uniprot programmatically”. To me, it’s been quite complicated to find this page browsing in the website, since Uniprot documentation is huge (and I have no patience for this).

For instance, I have to retrieve a brunch of Helix Turn Helix transcriptional factors in fasta format. I’ve got a text file with one ID per line and must save them as Seq objects from the Bio.SeqIO Biopython module. Quite easy indeed, all the game is on the following function:

import urllib,urllib2

def getseq(ID, extention):
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    return response.read()

Please, consider that I still can’t figure out how to display ‘t’ tabs in wordpress,  change the spacing if you’ll ever use this. The response.read() returning value can be managed as a string. One can iterate this in order to print everything on a text file to be parsed with the SeqIO.parse() method. As shown, importing both urllib and urllib2 is mandatory. The amazing thing of uniprot is that the programmatic acces to the website is facilitated by the very simple organization of the database. If you know the ID, you just have to add the file type you need and build a web address with the filetype as extention.

The crisis of Academia between DIY culture, Science advances and Welfare cuts.

If I were to fully analyze the phenomenon of the crisis of the Academia, I would end up consuming every single megabit available for the contents of this site. Such a wide problem could be dissected under several points of view and, for the scopes of this blog, directing the readers to the main aspects I have found will be enough.

To understand the crisis of academia, we should consider two questions. The first is what a student would ask himself before planning his post- school life: wich kind of educational path should I choose to gain the best formation for the work I want to do? And the second question could be made by any entrepreneur who’s designing a business plan based on innovation: where should I find the best developers for a new product? Fifteen years ago, the answer for both questions would have definitely been “University”. Nowadays, things have changed, and this is a good definition of what we call “the crisis of Academia”. In this month, I will try to deepen this in my insights here. Basically, I have found three main reason explaining why universities are facing a crisis.

First reason. Alternatives are quite good anyways. God bless DIY.

We must consider that the alternatives to academia, at the age of the Internet, are getting more and more effective. Many computer scientists gain more information and practice from web and sharing than from universities, and many amazing things are not developed in the labs anymore. Computer Science is obviously the best example we can make. For instance, one could choose to become a database admin leaning on his own forces and obtain a certification that will be surely considered in the labor market. In fact, also the role of majors in this should be better investigated. In many fields of knowledge, from computer science to arts, the mentality of “you gotta make it in this world alone” is rapidly spreading.

Second reason. Universities are failing to be up-to-date.

For a biologist, the most experienced phenomenon it’s definitely the difficulty of universities to keep up with Science advances. If you look at this from an Italian university like I do, it tends to be dramatic. Considering that my university, the Sapienza University of Rome, is the best ranked university for science teaching in Italy, the fact that biology teaching and academic offer haven’t change in the last 10 years, it’s quite explicative. The major progresses in theoretical, computational, synthetic and genomic research have been ignored. For someone who needs to achieve a good formation, the best idea is to supplement what your professors will teach you. No one can actually say that you don’t need to study biology to get a good job in the field, but if you want to be competitive, you must consider to supplement what you learned. University is very often what a mathematician could define as a “necessary but not sufficient condition” to get a good professional profile.

Third reason. Funding cut and war on public education.

In times of crisis, where I consider “crisis” just a buzzword made to justify the shameful welfare cuts that many european governments are actuating, the aspect of the decrease of funds for academia cannot be ignored. We could consider this both as a reason of the crisis of academia and a consequence. The cuts of funds in the universities is mostly verified in two big ambits. It occurs widely in latin- european countries, such as Spain, Portugal and Italy, as a part of the general decrease in investment on welfare triggered by international constraints, but it is also present in Anglo-Saxon countries. In both UK and United States, the rising of the fees for students is really sizable and represents a big problem in terms of social segregation. The cuts and the tightening of the access conditions, and the general pauperization of academic resources cause many students to choose alternative formation.

Anyways, seeing it from a different point, we should consider that one of the best analyzed and most important processes that are going on is that governments are steadily devolving decision power on many aspects of our society to majors and private groups. This theory has been very well explained by Noam Chomsky in an article he wrote two years ago. Basically, governments are losing the control of crucial sides of our society including education. Citizens are more and more in the condition of facing Major directly, bypassing the government intermediation. This could explain the efforts made by private groups to invest and drive the advanced education. The optics of the companies is therefore to choose the best educational system for their own interests. And this system, it may not always be the university.

We can conclude with an optimistic and “evolutionary” consideration. The word “crisis” derive from a greek word (that I don’t even dare to write) indicating both “destruction” and “innovation”. Crisis is a disruptive event that can threaten the same existence of a system, but it also represents a big push for a change. In an evolutionary perspective, we can say that this crisis causes a major pressure on university system that it will be urged to evolve. And the open access courses we can find on coursera or iTunes U, provided by the most prestigious universities are probably a good sign of innovation.

Interactions between proteins and DNA in an ab initio approach.

I am into this issue since when I started my master. I will take some time to explain my actual work on Pyridoxal 5-phosphate (PLP) related proteins and genomic regions in bacteria on this blog as soon as possible. Let’s just say we’re figuring out a way to determine, with a reasonable confidence, the location of possible promoters next to genes involved to PLP synthesis. Finding a promoter with bioinformatics methods usually means demostrating that a given region is a possible promoter by comparing it with a substitution matrix based algorithm. Phylogibbs and Pro- Coffee are good examples of this.

The language that DNA and proteins talk each other

If we try to generalize this problem, taking care of all the several cases in wich proteins and nucleic acids form complexes, we rapidly end up into a paradox. Although we cannot forecast wich sequence a protein will bind if we analyze the sequence only, we can expect that this sequence will be under selective pressure because the binding activity may be essential for the organism. Besides, in many cases the binding process is very specific. A transcriptional factor can recognize and bind one few baseparis sequence (and only one) among a huge amount of similar sequences. Proteins and nucleic acids talk in someway. I always wonder if it will be possible to decrypt the language they use for that.

Protein- DNA Docking

A starting point to achieve this could be the protein- DNA docking. Docking is a set of algorithms developed to determine the possible way a small MW ligand can bind a protein. It is the pivotal technique in drug design. More recently, the same principles of docking are starting to be applied to protein- protein complexes and protein- DNA complexes. With nucleic acids, the real problem is their intrinsic flexibility. It’s a well known fact that if you want to make a rigid- body simulation on a very flexible structure, you could have more than a problem.

The most used protein- protein docking algorithm is HADDOCK (High Ambiguity Driven biomolecular DOCKing). Developed by Alexandre Bonvin at Utrecht University, it is based on the concept of Ambiguous Interaction Restraints (AIR), that are calculated combining several structural and sequence informations experimentally achieved. The key- idea is to set up an algorithm able to calculate a possible overlap even considering a very complex interation.

You can understand this algorithm in detail from the original paper.

An effort to adjust HADDOCK to protein- DNA complexes has been carried on by Marc van Dijk’s group at Utrecht University in Nederlands. In their paper published on Nucleic Acid Research, the dutch researchers set up a new docking approach starting from the data collected on the monomeric repressor−DNA complexes formed by bacteriophage 434 Cro, the Escherichia coli Lac headpiece and bacteriophage P22 Arc.

The day when we’ll be able to download a PDB file and predict the exact site where a protein will bind the DNA is still far, but the very first steps in the world of interactions between proteins and DNA are being moved in bioinformatics too.

Learn Ensembl API on EBI's website. An open workshop.

The Ensembl database comes out with a strong and well documented Application Programming interface. I must observe that Ensembl is much better than other DBs such as Uniprot or NCBI, wich look more oriented in providing good graphical interfaces than facilitate programmers’ life, becoming a very good tool for wet lab biologists who need a small bioinformatics outline, but a slippery slope for bioinformaticians. But this is actually a my own preference.

If you happen on Biostar’s homepage, you can notice a link on the top directing you to an open workshop to learn the basics of Ensembl API. This will take you to a very detailed (and quite pleonastic) description of the course. There’s no need to apply or register. After getting you to browse in several introductory pages, the website gets you to a brunch of video classes you’ll enjoy for sure.

Ensembl API is written in Perl. And this is despicable to me since I am a proud Python fanboy. Note that this course won’t get you to learn Perl’s basics, and a small Perl knowledge is necessary.

So, if you want to learn Ensembl API, the links provided here will definitely help.

Was America discovered by the Romans? DNA sequencing provides evidence to substantiate this hypothesis.

Since my childhood, I was taught that the first european to set foot on the American continent was Cristoforo Colombo. Or better, the Vikings, but Italian education system never gave too much importance to those northern horned freaks. Then, when I moved to Barcelona, people argued me that Cristoforo Colombo’s name was actually Cristobal Colon, and he was Catalan. I must fairly admit that this theory makes a lot of sense, since one of the very first islands he discovered took the name of a mountain just a shot away from Barcelona, the Monserrat. Anyways, national prides and few historical proofs make the history of America’s discover quite cryptic.

Recently, another fact has come to light, and this story got more complicated and much more amazing. The italian science writer Elio Cadelo reported somenthing really striking on the italian newspaper La Stampa.

In a roman Shipwreck, dating back to Repubblican Age and found off the coast of Tuscany, the remains of a Roman doctor have been found in a very good condition. Archaeologists have unearthed phials, bandages, surgical instruments and closed boxes containing tablets very well-preserved. A DNA genomic analysis revealed that the tablets were made with Ibycus and Sunflower seeds. This is the point. Ibycus only grows in Southeastern Africa and India, and Sunflower is an american plant. Official history says that the very first one to describe a Sunflower to the Europeans was the Spanish conqueror Pizarro, who mentioned how Incas used to worship it as a sun- related divinity.

How did Romans got Sunflower seeds? We can make two hypothesis. On one side, one could argue that sunflower could have existed on this part of the Ocean at the age, and then be extinguished. On the other side, we can say that Romans actually discovered America and started to exploit his resources, or maybe started to commerce with indigenous populations. Both quite weak, honestly. We don’t have any proof indicating that Helianthus annuus, the Sunflower, existed outside America, and think that it existed and then extinguished it makes even less sense, given the economical importance of this flower, which surely would have attracted the interest of European farmers. But we also have to wonder why Romans, proud and fierce conquerors, and great historians, never tried to conquer the american territories and never reported in historical chronicles.

Anyways, that sunflowers tablets are talking for themselves. Furthermore, in his book (unfortunately in italian only), Elio Cadelo supports his fascinating theory with more evidences. A small roman literature talks about “brand new lands in the west”, and there are artifacts proving that an exchange between the two coasts of the Ocean really occurred.

So, was America discovered by the Romans? Well, I have to be honest. As I told before, everyone tries to claim this discover for his own country. I think I am just doing this, but this fact is fascinating enough to be reported. Where’s biology? Come on, they used DNA sequencing and I guess that Barcoding PCR is involved.