Posts by F★bio

Dottorando in Bioinformatica all'Università Autonoma di Barcellona. Webmaster e responsabile del podcast.

Is "how to do bioinformatics" the major topic in bioinformaticians online reading habits?

Sifting through my website stats, I realised that bioinformaticians are reading more posts discussing “how to do bioinformatics” than the ones with a strict scientific content. Is this a feature of this blog, or does it reflect a common problem with working habits?

Drawing some conclusions after two years of atcgeek

After almost a couple of years blogging on atcgeek, I can dare to say a thing or two about this experience. If I scroll the statistics of this blog, I cannot really complain about the interest generated in the readers. Even though I won’t become famous by writing here, I can tell that 38k views from January 2014, and peaks around 1k views/day is not a bad result, considering the long pause I had to take whilst moving to Barcelona and starting my PhD. Nothing really special, but not even a disastrous failure.

The three main topics at atcgeek

Although I use to divide my posts into thematic categories (bioinformatics, biochemistry, structural biology, etc.) and into types of article (news, insights, video, hacks and personal blog), I realised that I basically tend to write on three topics: education and work practices, methods and reflections. The posts of the first kind are about “how to work in bioinformatics” or “where to learn the basics”. The second ones are the one in which I report the new methods that have recently published, and the third category recoils the posts that propose scientific insights on the role and the nature of computational and theoretical biology.

The most of the interest goes to educational and work habits posts.

The order in which I mentioned these three topics, coincides with their ranking in terms of interest generated. Education and work practices come first, methods are the second ones, and the bronze medal goes to the insights. Swiftly and boldly comparing my site statistics with the interest generated on social network, I can dare to say that the people who read atcgeek are particularly interested in discussing about how to improve their working habits, how to start working in bioinformatics, or to share a bit of self-irony with me as I talk about the shit I use to do when I work. Take it as an impression that is barely supported by statistics, but plausible enough to put a question.

Based on what I see on atcgeek, people is more into discussing how to do bioinformatics or how to learn the basics, rather than the bioinformatics itself, and there could be some reasons behind this.

Of course, we should keep clear that this blog is written by a PhD student who is sharing his experience while walking the first steps in computational biology. This is a point, since anyone would be more interested in the opinions of someone more influential than me for anything about the “scientific part”. The main goal of this blog is to horizontally share my experience and to productively interact with my visitors, more than claiming to be an expert of the field and aiming at “coaching” the readers. On the other hand, anyways, if experience matters, it should matter in both the topics, since the thoughts of an experienced scientist are more evaluable than mine in both work habits and in science.

Do we have a problem with how to do our work?

Despite the shift I am seeing in the readers’ interest may be due to the characteristics of this blog, I still have the feeling that “how to work” is the major “hot topic” in bioinformatics community, and we may strongly suspect that this reflects a problem. Bioinformatics is basically the domain of non-computer scientists working with computers, the merge of two super-rapidly changing sciences, and the development of proven, shared and consolidated work strategies is far to be a reality, especially if compared with experimental biology, were lab practises are widely discussed and protocols are consolidated.

There is one last thing to say. In the real ranking of the most visited post, the most read one is not really about bioinformatics. Let’s say that this discussion should be focused on what bioinformaticians use to read online when they are keen to read about science. Including the other interests could be puzzling.

BTW, thank you for the interest in this stupid diary.

Advertisements

The four most stupid things I have ever done in bioinformatics.

It was a cold November morning, year 2011. Sapienza University has a huge campus next to the city centre of Rome, where the main faculties are stored in huge buildings in the rationalist style. Yet, the faculty of Biochemistry has a detached site in the neighboured flanking the campus, San Lorenzo. I was crossing the streets of this wonderful ex-industrial alternative hood to reach my new lab. The clock was marking 10:30 AM, and I was joining bioinformatics. Professor Stefano Pascarella had accepted to supervise me in my master thesis, and it was my very first day. Four years have passed, I have graduated, worked in five different labs, and even if my experience is not really long, I think I have already a couple of stories to tell.

Stupidity matters. Despite the most of the people use to link science to intelligence and genius, seeing research as a matter of the “smart guys”, we must admit that the lab routine is often studded with the crap we make, and that researchers can become protagonists of actions of remarkable stupidity. And if we scan the first, faltering steps of a researcher’s career, we may find a couple of funny nerdish stories to tell with colleagues in a bar. And since I’d be so sorry to know that someone of you may run out of funny anecdotes about grad students’ stupidity to tell, let me report the four most stupid things I have ever done in bioinformatics.

Trying to fetch information from uniprot on 1750 genes without any programming

Munch_The_Scream_lithography
The first task of my master thesis was simple. My advisor provided me with a list of 250 uniprot IDs of MocR proteins in several bacterial genomes. Helix-turn-helix transcriptional factors, with an amminotransferasic domain allosterically regulating them by pyridoxal-5’-phosphate binding. The lab had identified these sequences with HMMer, and we wanted to know something more about the flanking regions. The professor told me to annotate 3 upstream and 3 downstream coding regions in order to see wether some recurrences could indicate a conserved multigenic region; simple and straightforward.

The next day I was shattered, reclining a lost look on my screen, at 8 pm and after ten hours of work. A hard lesson that I have learned by the time, is that if you did something wrong in designing your bioinformatics workflow, a spreadsheet will show up at a certain point. I was staring at an OpenOffice Calc window with about 40 rows, and had managed to find a way to manually scan the flanking region. I don’t remember exactly my glorious strategy, but it should have sounded like this:

  1. Copy and paste the id on uniprot and search it.
  2. Scroll the way down to the crosslink pointing at a graphical genome browser and open it.
  3. Perfect, you are on the spot! Now move the browser forward and back, you will find the flanking sequences.
  4. Select any flanking gene in the interval and make your way back to uniprot
  5. Save the information you get (the Uniprot ID basically) on a spreadshit and go on

I was then suggested to stop doing this and go further with studying python. That was the day when I learned that there is no bioinformatics without programming.

The protein-DNA docking to fetch promoters.

Doc-BrownAfter the first explorations, the final goal of my M.Sc. thesis work became the identification of a conserved promoter region upstream the neighbouring genes pdxS and pdxT, coding for the two subunits of the pyridoxal-phosphate polymerase holoenzyme in bacteria. This memory tastes a bit sweet, as usual when you end up remembering how naive you were when just a newbee. It was the early 2012, January or maybe Feburary. During a lab meeting, I argued that a good option to find our promoters was to perform a docking analysis on a set of candidate promoter sequences, docked with the MocR transcriptional factor that was found activating their transcription. After having explained my point, I realised that anyone was just looking at me with dismay. Do you know that awful feeling of anyone in the room looking at you like you’re crazy? I was explained that the methods developed for protein-DNA docking were still too ineffective to fetch a reliable result. Protein – DNA docking to infer the binding region of an HTH? Pure science fiction. At least, that day I have been introduced into one of my favourite topics in bioinformatics: the communication between DNA and proteins.

Declaring profanities as variables in your code.

1142382632_swearing_xlargeEven if I am quite used at threading jokes in my code, taking it as a “nerdish rebellion” against my even more nerdish work routine, what I am going to tell here didn’t actually happen to me. I include this story I have heard of in my post because it’s really worth reading.

In team-working sharing code is fundamental, and the best habit you can take is to write variables in a human language, and to write proper comments in order to get the people who will read your code to understand it (to any possible extent). Anyway, the first thing you should care about before sharing your code is to make sure that it won’t worsen the opinion your colleagues have about you.

This story has all the ingredients that a good academic joke needs to succeed: a polite and old-mannered thesis director, a graduate student with a sense of humor that his advisor won’t get, swear words, profanities, and a Perl script to show them up.

Stefano Pascarella is not old at all, but he is still the kind of super-mannered and polite Italian professor. I worked in his lab for two years long, and never heard him yelling at anyone or just expressing disappointment with harsh. Quite remarkable, since he was my thesis advisor. Instead, I never met the student who’s the protagonist in this story, and I can just assume him as the typical 20-something master student. The only thing that I am pretty sure about him is that one day he wasn’t at the lab, and his code was needed for some reason.

Professor Pascarella sat down in front of the terminal and rapidly found the file he needed. The people who told me this story just can’t forget the expression on professor’s face. A calm and bored expression ran immediately into a serious face, that swiftly faded into disconcert. Any given variable of the code he was reading was either a bad word or a profanity.

Later on that day, the student received a mail “kindly asking” him “to take his coding routine more seriously”.

Ignoring the find/replace function in a text editor.

maxresdefaultOk, I am figuring out what you are thinking. “This moron didn’t know that text editors had a find/replace function and corrected a whole code manually to change a single word”. Not so, I did something that is possibly worse. When I started to write code, actually I did not know much about the existence of this amazing function in my text editor, but I was still very sure that the process had to be automatised. My ignorance on text editors mixed dramatically with my inclination to programming to give rise to one of the most stupid things I have ever done.

As I finished and tested the script named changeword.py, I was totally sure that it was one of the best things I could produce with my short programming experience. I don’t really remember the code, but it should have sounded as follows:

#! /usr/bin/python
import sys
filein = sys.argv[1]
word_to_change = sys.argv[2]
replacement = sys.argv[3]
a = open(filein,’rU’)
b = a.read()
a.close()
print b.replace(word_to_change,replacement)

To run it, you just needed to input the file and the word you wanted to change with its replacement, and anything went to the standard output:

$> ./chageword.py my_file.txt first_word second_word > my_corrected_file.txt

Et voilà, the text came out changed. Luckily, at a certain point I realised that my fantastic script didn’t work for any change I could need, and decided to discuss this problem with a postdoc in my lab. He is still laughing about this.

Write the MD5-checksum code on the same file from which I extracted it.

MKSB023_UselessMachine_Animation_largeFatigue plays tricks, and makes a perfect source of inspiration for stupid actions. When you are tired you can experience severe logical failures, and brightly shatter your work in seconds.

This happened a few months ago. Tracking your input, output and script files is very important, and even if we are not used at version control systems, annotating any file with its MD5 code may help, to some extent, in having a better tracking of your work.

The MD5 algorithm assigns a unique code given an input. If you input a file to the MD5, the output code will correspond to that file univocally. Of course, if you modify the file the resulting MD5 code will change.

I was finishing a long scripting course and was adding information on my output tabbed file in an hashed header. As I calculated my MD5 code, I had the brilliant idea to write it on the same file from where I extracted it. Not to mention that after having pasted the MD5 code on the file, the MD5 code of that new file inexorably changed.

It took to me a good quarter of hour to realise it. It was 9 PM, and I thought it was just my brain asking me to go home for some rest.

As I said at the beginning of this article, stupidity matters. And ironising at yourself matters even more. Cognitive work requires the application of all your rationality, and it is thus fundamental to understand its limits, or else the borders of your intellectual skills that are shaped by stupidity. I think that there is no shame in recognising you own limits, and publicly admitting them is someway therapeutic.

Quoting an Italian PhD student I have met at my department who recently graduated, “there is no use for a PhD course except in the light of understanding how stupid you are”. I have recently registered for my second year of PhD here at the CRAG, and still have a long way ahead to explore the deepest corners of my stupidity.

After all, the Diesel advertisement showed as heading image of this post, may be right. You are stupid only if you try to explore your limits. And this is right about what I am up to.

Italian Minister of Health authored the preface of a book supporting homoeopathy.

War on Science, yet another chapter. I think that anyone working in Science or caring about it, and anyone who aims to a growth in the public opinion’s awareness on the scientific issues of global interest, tends to spend some time to contrast hoaxes, misconceptions and anti-scientific propaganda. The most of the times, you end up returning references to the documents published by health and science officials to those ones claiming that “official science” is lying. If someone affirms that vaccines cause autism, or are potentially harmful for the child’s health, you may consider responding with data provided by health institutions. Likewise, if someone is keen to promote homoeopathy as a real cure, some documents published by the NIH, FDA or WHO, and proving its flat inefficacy, could turn out really useful. Basically, the most of the times, national and international health institutions are on your side, providing you and the whole public opinion with referenced data and clear positions in favour of  “official” biomedical science. But what happens if a national health institution turns its way, and starts supporting one of the major scientific hoaxes ever, such as homoeopathy?

This disturbing scenario has just become reality in Italy. The Minister of Health in charge, Beatrice Lorenzin, authored the preface of a book supporting homoeopathy, entitled In Praise Of Homoeopathy (Elogio della omeopatia) and written by Giovanni Gorga, president of an association of enterprises producing and delivering homoeopathic products. Even if the official position of the Ministry in matter of homoeopathy has remained unchanged, and homoeopathic products are sold in Italy as “medicals without any approved therapeutic indication“, this clear stance of the Minister Lorenzin generates concerns in the Italian scientific community.

The Italian non-profit organization CICAP, devoted since yeas to counteract the diffusion of anti-scientific information in Italy, has presented an open- letter to ask the Minister to clarify her position about the real efficacy of homoeopathic products, and to publicly declare that there are no evidences supporting it. The International Association of Italian Researchers (AIRI) is spending as well to spread this letter and rally the support of Italian researchers.

Being an ecologist and a radical leftist, I am very far from being a Beatrice Lorenzin’s supporter. Forty-four years old, serving as minister of health since the formation of the government led by Matteo Renzi in the spring of 2013, Beatrice Lorenzin grew her political career within the right-wing coalition led by Silvio Berlusconi. Anyways, I have always considered her a very reasonable woman and a politician of rare quality in the awful italian political landscape (not a big medal, actually). I am in fact pretty surprised by this awkward fail, and I still comfy that anything could be fixed.

I would limit to consider this fact as the usual yet another strange thing coming from Italy, or one of the many events stating how difficult the relationship between science and governance is, but I fear that something more serious is on the way. In the neo-liberal West, governments are all about economy, and the promotion of private sector has become the only concern of administrations of any political area. The last April, bloomberg published an insight pointing out that homoeopathy constitutes a billionaire market in the United States. I fear that the only element we don’t consider about this matter, is how much money an hoax could generate. Even if it is pretty clear that homoeopathic products have no effect on human health, it is still able to generate consent and to turn it into business and jobs. And a disturbing question comes along: will governments shut a wealthy sector for ethical reasons?

Applying phylogenetics and bioinformatics to NF-kB studies

To anyone having to do something with immunity studies, the nuclear factor kappa-light-chain-enhancer of activated B cells, will sound really familiar. The NF-kB is a protein complex deputed to initiate the transcriptional response to external stimuli, such as stress, citokines, antigens, bacteria, free radicals or UV light irradiation. Expressed in active B cells, it is the protagonist of the immune response at molecular level.

For quite a long time, its evolutionary characterization has been rather neglected, since no homologous sequence is found. Actually, I often happen to realize that biomedical studies tend to keep quite far from evolutionary approach. Biomedicine is about to understand processes happening here and now, and it often aims to quickly find a reliable therapeutic approach for the disease of subject. So many factors to study, so little time. This shifts biomedical studies away from the influence of evolutionary biology. A real pity, as Catriona MacCallum  pointed out on PLoS Biology in 2007, since the contribution of evolutionary biology to biomedicine has a big, almost unexplored potential.

Recently, NF-kB and NF-kB-like proteins have been discovered in “basal” marine animals and non-metazoans, allowing the study of the early evolution of this nuclear complex of extraordinary importance for human health. John R. Finnerty and Thomas D. Gilmore from the University of Boston published an interesting paper on this topic just a few months ago, and I dare to introduce it here for two main reasons.

Beyond the clear scientific interest of their work, representing one of the few and really valuable evolutionary approaches to an all-biomedical subject, and highlighting deep conservation and repeated instances of parallel evolution in the sequence and structure of NF-κB in distant animal groups, which suggest that important functional constraints limit the evolution of this protein, it also provides an explanation of how to easily apply phylogenetic and bioinformatic approaches even without a previous hard training.

The authors run on the double track of reporting a scientific result, and introducing the reader to some simple (but still effective) computational tools that more or less anyone may use to implement phylogenetics in his/her work, rendering Methods for Analyzing the Evolutionary Relationship of NF-κB Proteins Using Free, Web-Driven Bioinformatics and Phylogenetic Tools a very interesting reading for both bionformaticians who need to communicate with experimentalists, and people working with NF-kB.

The article is part of the methodological book NF-kappaB. Methods and Protocols edited by Michael J. May and published by Springer Protocols.

Happy birthday mr. GNU

It was the early eighties, a day like any other at the MIT. And a printer was not working. The Artificial Intelligence Laboratory programmer Richard Stallman did his best to fetch the source code of the driver from the manufacturer to fix it, but there was no chance. The code was closed, and this was definitely a huge problem. Because if we give up sharing our work, we cease to work for the common good. And this should never happen in science.

All of a sudden, something as simple as the possibility to modify a driver became the symbol of an epic struggle. The struggle between greed and generosity, individualism and solidarity, profit and redistribution, patents and free knowledge and, to some extent and in a more philosophical fashion, between capitalism and anti-capitalism.

It was the September 27th 1983, and Richard Stallman was announcing his challenge to the world: ensure that the source code flows freely. The GNU project was born.

Over the years, a huge crowd of any kind of programmers joined the movement, rising the flag of free knowledge as a means for the redistribution of wealth, and for the spread of democracy. A lot of admirable and romantic ideals that shocked the world as they proved to be effective enough to beat up the informatics bad guys. Although the efforts of software majors to promote their closed and patent- based way to software, the free software movement has been the one to dictate the metrics and trace the groove of many aspects of the evolution of IT market. The encounter with Torvald’s kernel linux, the birth of the main distribution projects, the extension of free software principles to all the aspects of cognitive production that led Lawrence Lessig to found Creative Commons in 2001. Year by year, open source software have spread over, becoming the standard for almost everything that is leading the internet nowadays,  including Google and Facebook.

A lesson that we still need. Openness is fair, and it is productive. As the debate on Open Science spreads up, the example of Free Software still traces a way we must follow.

The Catalan scientists' support for Independence and Science production in post-modern Europe

The day of truth has come for Catalonia. The region surrounding Barcelona will meet a crucial election tomorrow in which the people will decree whether following within the Spanish State or going further with the negotiations to separate from Madrid and build and independent state. Differently from the UK in the Scotland affair, anyways, the Spanish government doesn’t grant any right for Catalans to decide about their own destiny, and a victory of the pro-independence front (that is largely expected) will drag the region into a dramatic situation of political stalemate. As you may expect, the discussion is very heated, as anyone is questioning about the many aspects that the independence implies.

Sometimes a link is better than thousand words, especially if it will redirect you to the Catalan News Agency website, one of the most complete resources for updates from Catalonia in English, where you can make up your mind about this complex matter. On this blog, I think we’d better focus on the implications of a possible declaration of independence over the Catalan research system, that is one of the most growing and dynamic in Europe. A swift browse on the Catalan bio- region official website will return a fair picture of the vitality of this area in Science production, accounting for hundreds companies that have doubled in the last decade and hundreds research groups included in dozens of research centres, hospitals and universities. A swift calculation attests that one worker out of four in Catalonia is employee in Research and Development area at any level.

It is not surprising that the 11 internationally renowned Catalan scientists’ declaration in favor of the independence from Spain had an explosive effect on the electoral campaign. On September 22nd, scientists like Jaume Bertranpetit (UPF evolutionary biologists known for its studies in human genome evolution), Xavier Estivill (CRG Group Leader working in non-coding RNA and diseases) and the Princeton professor Joan Ramon Resina, signed a document in which they affirm that voting for the independence “is the best option to maintain the good work and the consensus achieved through many years” and that the new Catalan state will have the opportunity to “increase the resources that science requires and provide the state structures to guarantee the consolidation and growth of the research system“.

Catalan scientists basically blame the Spanish government on two main points. First, according to the document signatories, Spain didn’t support enough science, having cut the national funding way too much to guarantee a good competitiveness of the Spanish research. In this, there is also a matter of redistribution. The criteria adopted from the Spanish Ministry of Science are claimed to be not meritocratic and to respond to mere political interests. Second, the same structure of Spanish academia is argued to be not satisfactory in terms of dynamism and effectiveness, as a major autonomy in the decisions for universities and research centres is strongly advocated.

I have not enough information to propose opinion of mine, and after my short experience here in Barcelona I can just appreciate the huge potential this area has in Science. Anyways, we may try to find an interpretation, and even realise that what is going on in Catalonia is just the reflection of something far more extended.

About ten years ago, the Marxist philosophers Antonio Negri and Michael Hardt proposed that, in the current postmodern age, the production strategy changed from a Fordist factory- based structure into a city- based network. The idea starts from the assumption that technological advancements moved cognitive work to the centre of industrial production. The city thus become the “factory” of the new era, because of its role in interconnecting individuals, research centres, facilities and small enterprises in a peer production cluster. Any city- factory is included in the network and the network is regulated over a large scale, ideally over a global scale, but more realistically over a continental scale. While during the modern age the system’s core was a conjunction of industrial areas that were interacting at national and international level, in the post- modernity the system is based on productive clusters, the cities, interacting at continental and global level. The consequences in politics are pretty clear. National states end up ceding power downward to those institutions that are able to govern the single peers, and upward to those organisations acting at a continental or global level. This is strikingly evident in Europe, and we can tell that it is the main force driving the European integration. During the last decades many national states have reformed their structure to grant a major autonomy to local and city governments and, on the other side, the birth of the Euro displaced the economical governance towards the UE institutions.

Actually, Negri and Hardt’s view has been deeply controverted by those who claim that it is not fully capable to draw a realistic picture of the whole system, and that it just works, to some extent, in describing the quaternary sector of economy. Even if most of these critics may have sense, we can still rely on this city-factory model, since our interest in this discussion is limited to scientific production. What Catalan scientists have understood is that their future challenges will be played both locally, over the Barcelona area, and more widely over the European Research Area. The city needs a full decisional autonomy to freely interact with the other peers at European and global level, in a game that is getting way too hard for the dated and cumbersome Spanish state, that is not able to be a good teammate anymore.

No one can really tell how it will end. Tomorrow, a large pro- independence majority is expected, but even a defeat won’t most likely stop the growth of the separatist sentiment, so deeply rooted in the new generations. The controversy between Barcelona and Madrid will drag on for years, bringing along the symbolic meaning of how Europe is changing.

There is a funny word pun that Catalans use to remark their sense of belonging to their own land. In Catalan, you just need to move a single letter to transform the sentence I live in Catalonia, Visc a Catalunya into the catch Long live Catalonia, Visca Catalunya. The only personal comment I can add is that I am overjoyed to have the possibility to give my tiny contribution to such a thrilling scientific environment. And in any case, and with any political scenario, I will keep trying to do my best as researcher and citizen to keep Science growing on this side of the Mediterranean. Because whether Spain or not, jo visc a Catalunya, visca Catalunya.

Organising the European #PlantScience Retreat 2016 in Barcelona.

There is no better way to come back to my blog writing than by announcing a couple of novelties. As someone may remember,at the beginning of January I joined the Centre for Research in Agricultural Genomics (CRAG) in Barcelona as PhD student. Along with other PhDs at the CRAG, I am volunteering for the organisation of the 2016 edition of the European Plant Science Retreat, that will be held here at the CRAG at the beginning of next january.

The European Plant Science Retreat (EPSR) is an international scientific congress organized each year by a team of local PhD candidates. The idea came in 2007 when PhD candidates from three European research schools in Plant Science (EPS in The Netherlands, IMPRS in Germany, and SDV in France) initiated an international collaboration to improve research, training and education of plant science PhD candidates in Europe. Yearly since 2008 this network organizes a congress “by and for” PhD candidates in the different associated institutes, which have highly related and complementary thematic.

This year, the EPSR will count its 8th edition, and after having been successfully placed in Netherlands, Germany, Belgium, France and UK, this is gonna be the first meeting in the Mediterranean, and there is no better context than the ultra- vivid and prolific scientific environment of Barcelona.

We are now working on any aspect of the organisation, from the fundraising to the selection and invitation of the speakers, in a very challenging, but still instructive activity that we are putting side by side with our ordinary PhD bustle.

As for now, I can share with you the link of the provisional epsr website, that will redirect you on the active profiles on the main social networks. Updates will come very soon and I will return on this from time to time on atcgeek too.

I told you that I had a couple of novelties. Well, the second one it’s more about me. My first paper has been accepted and it is next to be published online. With a bit of patience I will be able to tell you how, some time ago, in Rome, we used to evaluate how good exercise is in a clinical picture of cancer cachexia.