The role of mathematical modelling in evolutionary biology.

The role of mathematical modelling in evolutionary biology is pretty questioned, although its integral role in studies on evolution. Differently from other scientific disciplines, such as Physics or Chemistry, Biology was born as a descriptive science, and the affirmation of mathematics as an effective and indispensable part of investigation is still to be fully accomplished. In evolutionary research, an important role of mathematics is to provide a “proof-of-concept” test of verbal explanations, paralleling the way in which empirical data are used to test hypotheses.

Whereas the connection between empirical analyses and theoretical modelling is straightforward in some cases, such as the construction of likelihood functions for parameter inference and model choice, empiricists may not appreciate the importance of highly abstract models, which might not provide immediately testable predictions. Probably, skepticism stems from some misconceptions and misunderstandings about mathematical modelling, and a clarification about its role may ease the communication between experimentalists and theoretical biologists.

Some evolutionary biologists from the USA point this out in a very clear paper, published some days ago on PLOS Biology, and first-authored by Maria Servedio from the University of North Carolina. The parallels between empirical experimental techniques and proof-of-concept modeling in the scientific process are explained in the following flowchart.

As shown, the proof-of-concept models are best suited to test the logical correctness of verbal hypotheses, such as the effectivity certain assumptions have to lead to certain prediction. Hypotheses which assumptions are most commonly met in Nature, are instead argued to be possibly addressed by empirical approaches only.

Discussion on most common misunderstandings is centred around three main points.

The authors first argue that the main misunderstandings, in matter of mathematical modelling, happen as theoreticians are asked how they might test their proof-of- concept models empirically. The models are discussed to be themselves tests of validity of verbal assumptions, and their outcome can thus determine whether a verbal model is valid or defective.

Second, this does not mean that proof-of-concept models do not need to interact with empirical work. Actually, in most of cases, quite the contrary is true. Many vital links between theory and natural systems can be found in assumption stage, prediction stage and even in discussion stage, when empirical results are threaded into a broader conceptual framework.

Third, authors point out that a discordance between theoretical predictions and empirical data may be a great point of interest, giving to both theoreticians and experimentalists the opportunity to appreciate underrated phenomena, or to reconsider the assumptions and empirical procedures.

Despite this paper discusses in detail the role of theoretical modelling in evolutionary biology, we should take our time to reflect, in general terms, on the relationship between experimental work and mathematical modelling. I am very next to write about the criticism that is investing some of the most common algorithms for NGS data analysis, because I have the feeling that the search of proper mathematical modelling algorithms will be one of bioinformaticians’ main occupation in coming years. This article serves thus as a fair example, even if not directly applicable to all the fields of life sciences, of how the relationship between empirical and mathematical work should be properly interpreted.

Advertisements

Merda, FICA, Stronzo Bestiale and Maggie Simpson. Peer-reviewed epic fails and Easter eggs.

A peer-reviewed journal is supposed to accept a paper after a proper review. Over the years anyways, many facts have proved that not all journals put due attention to the review stage, and epic fails, trolling and Easter-eggs may come along. In this, the role of Italians is pretty relevant. Since the age of Caligola, the Roman emperor who referred an horse as Senator to underline his little consideration of the Senate, Italy has build a solid tradition of trolling, stupid jokes and other annoying antisocial behaviours. After all, rolling a journal is pretty easy if your native language is relatively unknown to the rest of the world, and it’s quite straight to hide foul language into acronyms.

The “Stronzo Bestiale” fact, for instance, went down to history. In 1987 the american physicists Bill Moran and William G. Hoover, published a paper about the application of fractal theory to gas diffusion. The paper was submitted as authored by Moran, Hoover and the Italian researcher Stronzo Bestiale, who was affiliated to the Institute for Advanced Studies at Palermo, Sicily, Italy. The problem is that this institute is totally fictional, and the expression Stronzo Bestiale can be translated as “Full Bastard” in English. According to many rumours, and to the version Bill Moran gave some weeks ago, the idea to put a troll-author was suggested by Giovanni Ciccotti, professor of Computational Physics at the Sapienza University of Rome. The paper had a good scientific relevance anyways, it has been cited 165 times, and it is still available on Springer’s website. Of course, it is not open access.

This was not the only case in which trolling was eased by the usage of Italian language. In 2003, Norrel and Wheeler from the American Museum of Natural History in New York, proposed the Missing Entry Replacement Data Analysis value, as a replacement approach to deal with missing data in paleontological and total evidence data sets. Reverting this into an acronym, it comes out as the MERDA value. In Italian, the term merda means shit. Of course, you can find the paper proposing this shit value on the Journal of Vertebrate Palaeontology. I am still figuring out whether this is a real Easter egg, or just a coincidence. For sure, Keith Bradnam could find this relevant to his noble mission against the ongoing threat to humanity from the bogus use of bioinformatics acronyms, even though this work shifts far from bioinformatics.

Another funny fact moves our attention towards engineering. The Italian communication engineer C. Di Nallo, proposed his folded inverted conformal antenna (FICA) for multi-band cellular phones, to a congress in 2005. The Italian term fica can be translated as pussy or cunt.

Of course, we could add tens of examples. I still have a preference for the idea those guys who submitted a very clear message to the journal asking them to quit sending emails had. Right about two hours ago, an headline on vox.com reported the publication of a paper authored by Maggie Simpson and Edna Krabappel. For the few ones who ignore them, they are two characters of The Simpsons cartoon serie.

Beyond the fun, we could try to take a couple of conclusions from these facts. Of course, as the law of the jungle established in Science, a deadly competition came along, running many scholars into a “publish-or-perish” mentality. On the other side, journals are also constrained by the draconian laws of the market. The need to achieve a good amount of papers published while keeping the costs low, may cause the small publishing group to skip some reviewing steps.

Can we do something to fight this in order to improve the peer-review system? Well, I am actually submitting my very first paper, I have not enough experience, and I can just share my very, very humble opinion. Anyways, if you consider the publishing mechanism as a complex machine, a small percentage of errors are quite natural. Even if any effort to polish off literature from junk papers is desirable, I fear that we won’t be able to prevent any mistake. And, after all, who really needs that? Without a bit of irony, things could get unbearably boring. Even Science.

Boobs for Science is ready to expatriate. English version to come the next week.

UPDATE 12/8: Boobs for Science is online and available at boobsforscience.com

I have already described the “Tette per la scienza” project (from italian: “Boobs for Science”) in a previous post, and I have been most likely one of the firsts writing about this story in English. The international press has put a lot of attention to this provocation hailing from Italy, in which women show up in bra or topless while holding a paper with a Science fact on.  On Reddit, this story gained quite a lot success, and very soon the major journals dedicated their attention to this bizarre blog.

As the DailyMail and the New York Daily News published their comments on this story, the viral blog authored by Lara Tait crossed the borders of Italy to become yet another curious, brainy and funny idea spreading worldwide. The blog keepers were actually planning to put a blog in English aside of their original one in Italian language, and the flood of visits coming from abroad encouraged them to speed up its creation.

On September 30th, the blog admins asked the users to help them in the translation of the website, rallying hundreds of messages of people who accepted to help the exportation of this amazing project abroad. In a few hours, the admin announced to have enough collaborators to translate “Tette per la Scienza” into its English counterpart “Boobs for Science”.

Despite this runs a bit away from the scopes of my blog, I keep an interest in this project, because in the same planet where Sarah Pailin lives, anything goes in terms of Science disclosure. And if boobs are effective to get the people to appreciate the wonders of Science, shall the boobs talk for us.

I will update this post as I get to know the boobs for science blog’s address.

UPDATE 12/8: Boobs for Science is online and available at boobsforscience.com

The City, the Rainbow, the Science. And Mafia. A story from Rome.

An half- circle, shiny and huge double rainbow towers the cloudy sky above Rome, in an early December afternoon. A truly heart-breaking view, that lovers and photographers have surely enjoyed. And the landscape of the Eternal City fits wonderfully with the pastel colours of this autumn sky. But the rainbow is not the only dome covering the City, as the clouds you may see in the sky are not the only ones darkening its sky. There is another dome, a real cupola that is oppressing anything here, and the clouds we see at the horizon don’t add any poetry to our life. They are dark, and truly scary. During the last days, a disconcerting serie of news updates have shown us a city we couldn’t just imagine. A huge, impressively organised, and deeply rooted criminal organisation has been found controlling the city. Investigators have discovered a disturbing connection between criminality, city authorities and far-right neofascist organisations controlling the city. A couple of days ago, 37 people were arrested, and the former mayor of Rome, the ultra-conservative and ex-fascist Gianni Alemanno, was charged for mafia conspiracy.

I guess that many foreigners won’t be really surprised of this story, and definitely won’t share my disconcert, since the stereotype linking Mafia and Italy is pretty spread around the world. In deed, in Italy we had a different view. The criminal organisations has always been deemed to be established in Southern Italy only, and massive infiltrations in local governments have never been demonstrated to happen northernmost of Naples before. In the last years, many investigations have found Mafia organisations to cross their historical borders, with huge scandals affecting the wealthy, and more “Continental” Northern Italy. More, many proofs indicate a huge presence of mafia gangs all around Europe, along with their strict control of the drug market in Germany and UK. With this Roman scandal, Mafia has been demonstrated to be all but a “Southern Italy thing” only, and its exclusive interlink with Sicily and Naples has now to be considered a stereotype on its turn.

But why am I talking about this, in a Science blog? I have always kept in high consideration the connotations of Science in politics. Having been a Free Knowledge, Open Science and Public Education activist while at the university, I had many opportunities to reflect about the social importance of an open and democratic Science. Among the many analyses I happened to read, during my university political activity, I have found those connecting the city and the cognitive production particularly interesting. In the so-called post-modern age, cognitive production gets a pivotal and leading role in globalised capitalism. The city, with its connections between universities, research centres, companies and individuals, plays a role in cognitive production that is comparable to the one that the Fordist factory plays in industrial production. Science and the city, the city and society, society and Science. A visceral connection, allowing us to appreciate how much research depends on the city politics, and the extent of science social connotations. That is why I take my time to explain the Roman situation, and I am a bit confident that the understanding of how things work here may be fruitful even for those who live outside this wonderful, and disgraced city.

Synergy vs discord: Rome and Barcelona.

During the last two years, I had the opportunity to work in both Barcelona and Helsinki. I have been in Barcelona for a short visiting at the CRG, and attended a training period at the University of Helsinki. If we want to find a model for a good and effective connection between the city and Science production, Barcelona serves as a perfect example of synergy. Three main universities, associated with three neighbouring science parks, in a perfectly integrated bioregion, made up by the connection between universities, research centres and private companies.

In Rome, the global amount of research groups is quite higher than Barcelona. The three main universities are leaded by the Sapienza University, the biggest athenaeum in Europe. More than 15 public research institutes are spread over the city, accounting thousands employees. Differently from Barcelona, the research institutions are poorly interconnected. I am working at the Santa Lucia Institute as a Bioinformatician from September, and I don’t have a global idea on who is working on bioinformatics in Rome, and where. Very often, these institutions are seriously in competition, because of the small founding provided at government level. More, the lack of communication is worsen by the low number of private companies working in biotech in Rome area. Usually, private companies give a big contribution in the establishment of a well-functioning bioregion, because they know they can grow their business thanks to a proper collaboration network. In Italy is very hard to run a company, even just a small startup, because of the overwhelming bureaucracy, and the heavy tax charged applied to small companies.

Whereas institutions should work to ease the communication between research centres, they do about nothing to ease this process, and they seem mostly aimed at doing the contrary, or rather, adding discord where synergy would be needed. That is why, I can fairly affirm that a real “Roman bioregion” doesn’t really exist.

Mobility, city structure and speculation.

You can understand how much mobility is important only by spending a week or two in Rome. In other cities, things like buses and metros may be taken for granted. The capital of Italy, and the biggest city in the European Union in terms of surface, is provided with only two complete metro lines. The third line, the “C Line”, is not complete yet, and its construction is suffering from endless delays. Moreover, the city has grown dramatically in the latest decades. An infamous policy, aimed at favouring speculation, has let the city to grow uncontrollably. The lack of a proper planning scheme, along with a severe lack of public transportation, render the mobility within the city really adventurous. The effects on Science are clear if we consider the location of research centres. Spread mostly in the outskirts, they are very distant one another. I still remember the discomfort of a couple of colleagues in my laboratory who needed to withdraw some mice at the EMBL in Monterotondo. The Santa Lucia Foundation is located in Southern Rome, whereas the EMBL is in the Northern hinterland. This is the route you’d have to do in order to go with public transportation. Of course, they have chosen to go by car. Just 40 minutes for 50 kilometres.

A science park would be needed, well provided with core facilities, in order to not force a PhD student to transport lab materials and animals by car. Actually, this is what they planned when they have built the structure where the Santa Lucia Foundation is hosted. Unfortunately, right after construction the whole area went out of funding, and half building ran into abandon. Far away from European standards.

The effects on society: critical thinking and democracy improvement.

Science has the potential to prevent society from running into racism and corruption. That is why, I believe, it’s so contrasted in my country. Usually, the city authorities tend to promote science disclosure and cultural events. Unfortunately, the general cutoff that affected public services during the last crisis, didn’t spare cultural activities. Differently from other situations I explored, the city authorities make a small effort to promote the outreach of research institutes, and the same institutes are not putting much interest in communicating with the population.

The effects are disruptive. Italian public opinion is prey for any populistic campaigns. The mafia of Rome made a big effort in convincing the people that gipsies were responsible for terrible crimes, while they were exploiting those communities to impropriate of European fundings allocated for ROM communities integration. The most of the people in Rome, ran into racism, without any capability to distinguish from real datas describing phenomena, and the claims of a corrupted press that aims to the spread of xenophobia.

The antagonist, do-it-yourself and hacker environment.

There is one last thing that is crucial in a good Science policy within a city. Far from institutions and public funding, the volunteering in Science spreading and improvement is fundamental. In this, Rome is fervidly active instead. The state of abandon of many buildings, along with the great spread and establishment of antagonist movement, favoured the occupation of many “centri sociali”, independent and radical community centres. Acting as independent organisations devoted to cultural promotion, they often organise conference and courses. The hacker movement has known a great spread from the 90s, with Indymedia activists, autistici.org an independent service provider guaranteeing data privacy, and quite a lot counter-information websites and community radios. I remember when I joined a linux course at the community centre “Strike SPA“. I ended up learning linux bases and appending the principles of hacker culture. I was barely in my 20s, and running back with my mind, that was most likely the moment when I decided to join bioinformatics. Within the universities, collectives organise self-training courses (autoformazione), and linux- mac- and arduino- user groups are present in all the campuses. No exaggeration in saying that this independent and underground cultural activity impresses a significant improvement of the average cultural level of the city. Of course, it is the demonstration that things could be far better in Rome, if we’d manage to get rid of this suffocating cupola of mafia, misgovernment and corruption.

I really don’t want to show up as the typical endlessly-complaining frustrated Italian. I know that I share my part of responsibility with all my countrymen, and that complaints must make way to the commitment, as we are called to put a huge effort in changing our way to stay together.

I’d rather underline a point. The functioning of a good scientific environment depends a lot on how much scientists are able to fruitfully interact, exchange informations, and collaborate. It’s very linked with the cultural fervidity of a society, and the capability of scientists to get people to understand and appreciate their work. This means, in terms of Systems Theory, that the most important thing in science production is the grade of complexity of the academia-city system. And this must be our main and ultimate aim here in Rome: a better City for a better Science, a better Science to make the City better.

I must confess I really enjoyed that rainbow. In some way, it had somewhat of forgiveness, and looked pretty comforting. As the whole universe told us to chill down, because not everything is lost. As the police crosses the city, people gets arrested, and another judicial blood-bath takes place, we are all in front of a choice. We can either fall asleep, once again, with that awful belief that nothing will change, just to indulge, time to time, to our usual and annoying complaints. Or we can take this as a starting point, a new day, where our commitment gets stronger than our difficulties.

Let me conclude this long article with an expression of solidarity and support to our mayor Ignazio Marino, who is fighting this system since the beginning of its mandate. Marino is a Medical Doctor, who had a professorship at the University of Pittsburg, where he performed the first organ transplant in history in an HIV patient, and authored more than 170 papers. He’s now the man who is supposed to drag the City outside this muddle, and he’s now living escorted after the death threats he received. A man of Science. Not by chance.

Protocols.io, the online open repository for lab protocols.

As many others, I have collected my fair amount of profiles on professional and Science social networks. LinkedIn, Academia, ResearchGate. The real limitation all these web sites share, is that they basically provide you a showcase, in which you can expose yourself to sport your achievements, share your professional profile, and show up as cool as you can. I have always felt that proper tools for collaboration and information sharing in Science were lacking on the internet. Social networking for scientists is limited to a mere activity of results communication and discussion, whereas it could be really useful to have platforms to share datas and protocols.

That is why, as I have heard about protocols.io on Twitter, this project caught my attention immediately. Protocols.io is an online community serving as repository for experimental protocols in Life Sciences. A free, central, up-to-date and crowdsourced protocol database for life scientists. The project is promoted an maintained by ZappyLab, an organisation of scientists whose goal is to provide tools for protocols and lab methods sharing.

Registration is open and pretty simple. Differently from other science communities, you don’t need to provide an “institutional mail address”, any address goes, and this is great for undergrads and graduate students that may have not an official mail address. You subscribe with your mail, and that is enough to make your way to a growing list of lab protocols. You can share your own protocols, deciding whether to make them publicly available or privately shared with you colleagues only. You may also enjoy the benefits of having a smartphone, as ZappyLab provides an application for Android and iOS, available on marketplaces.

At this very moment I am exploring this website, trying to figure out how to deal with it, but it seems pretty simple and user-friendly. Of course, the amount of available protocols is not really high, but this depends on the number of subscribers. The more we are, the more we share, the more protocols will be available.

I cheer up to this project as I think it may represent a great contribution. We always make a big talking about “open science”, “reproducibility” and freedom of knowledge. But most of the times, we limit to blame the publishing groups for their policies of copyright, invoking a major openness. But what are we doing to help Science openness? Sharing your protocols is a fairly good contribution in this, and I hope that you will put your attention and give your contribution to this amazing project.

 

MapReduce: using the Google bigdata algorithm in bioinformatics

By this post, I will start exploring the possible contributions Google may provide to bioinformatics, a topic I am considering to deepen for a while. The rise of bigdata in biology is causing Big G to show a growing interest in this field, and very soon, the Mountain View giant may grow its investments in Life Sciences. This is definitely desirable, since the big expertise in matter of big data they have at Google could significantly boost up bioinformatics research.

Today, we’ll try to understand how the most fundamental Google’s big data analysis algorithm works. MapReduce is a programming model developed and brought to fame by Google. It is used to simplify huge data amounts processing. The main point they have in commercial data analysis, is to provide special- purpose solutions to compute big quantities of data. Companies are interested in analysing a lot of datas very quickly to better perform their campaigns and to understand markets.

How does MapReduce work?

MapReduce allows computation in workstation clusters, that can be executed in parallel on non- structured data, or within a database. The name “MapReduce” derives from two functions implemented in many languages, map() and reduce(). In Python for instance, map() function is used on a list to apply function to every item of iterable and return a list of the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. The reduce() function instead, applies a function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. Whereas map() is able to assign a value to any object in a list, reduce() will combine this values by iteration, reducing them into a single value. A MapReduce program is thus composed of three fundamental steps: the map step, the shuffle step and the reduce step.

In the Map Step, the master node imports input data, parses them in small subsets, and distributes the work on slaves nodes. Any slave node will produce the intermediate result of a map() function, in the form of [key,value] pairs, that are saved on a distributed file. Output file location is notified to the master at the end of mapping phase.

In the Shuffle Step, the master node collect the answers from slave nodes, to combine the key,value pairs in value lists sharing the same key, and sort them by the key. Sorting can be lexicographical, increasing or user- defined.

In the Reduce Step, the reduction function is performed.

mapreduce

Despite it may appear quite tricky to understand, it’s all but complicated. Data are first labeled with a mapping function, they are then sorted, and finally summarised by a reducing procedure. The *magic* is in the distribution of the work on more nodes in order to speed up bigdata processing. Output file could be processed again in a new MapReduce cycle.

Surfing around youtube, I really enjoyed this video by Jesse Anderson on youtube, that explains MapReduce functioning with playing cards. Simple, effective and genial in deed.

How could this help in bioinformatics?

As Lin Dai and collaborators pointed out on Biology Direct in 2012, bioinformatics is moving from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. As a matter of fact, cloud computing is getting more and more important in genomics, and is expected to become fundamental on the long run.

Some MapReduce implementations dedicated to sequencing analyses are already available in literature. Michael Schat’s Cloudburst can be a fair example. Published in 2008, Cloudburst is based on the open-source MapReduce package Hadoop to reduce the running time from hours to minutes for typical jobs involving mapping of millions of short reads to the human genome.

More recently, many methods have been developed to make cloud technology available for bioinformatics analyses. I just rattle off three examples, but there is a plenty of these software solutions out there. Galaxy Cloud, software for NGS data analysis, is most likely the best known. In protein structure and function prediction, the PredictProtein Debian package uses cloud technologies as well, and I have found this cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples very interesting in deed.

The list could be far longer, and a more detailed research among literature may return a plenty of cloud- based methods. In the most of them, MapReduce is reference programming model, and the understanding of its basic functioning could turn out as important as the comprehension of classic bioinformatics algorithms.

Some resources to learn and use MapReduce.

Hadoop is a MapReduce package developed by Apache and distributed under the Apache free- software licence. I assume this as a reference distribution, but it is not the only one. You can explore the official website and download a very detailed PDF guide. The tutorial is oriented to Java language, but you can easily find further documentation to run MapReduce in other languages.

Eventually, you may also consider looking a video tutorial. I had a look on this one on Vimeo, and it seems quite complete, but more will come out if you go searching through.

Will MapReduce replace SQL databases?

Someone may have noticed that MapReduce allows to sort and process data, and that their structuring is not essential. Asking whether this method could make SQL databases obsolete is a fair question, still quite a lot open. As far as I can understand surfing around the web, it is most likely not the case. In 2009, some researchers from American universities argued that MapReduce lacked many key features, and a bit of criticism accompanied MapReduce over the years. MapReduce is a great method to process data quickly, but structured languages still provide the best tools for their storage. That is why you will get to find many solutions to apply MapReduce to SQL databases.

I take this as a further demonstration that competition is often pointless, and the best will come from collaboration. If possible, we’d better take the best of anything we have available.