I am into this issue since when I started my master. I will take some time to explain my actual work on Pyridoxal 5-phosphate (PLP) related proteins and genomic regions in bacteria on this blog as soon as possible. Let’s just say we’re figuring out a way to determine, with a reasonable confidence, the location of possible promoters next to genes involved to PLP synthesis. Finding a promoter with bioinformatics methods usually means demostrating that a given region is a possible promoter by comparing it with a substitution matrix based algorithm. Phylogibbs and Pro- Coffee are good examples of this.
The language that DNA and proteins talk each other
If we try to generalize this problem, taking care of all the several cases in wich proteins and nucleic acids form complexes, we rapidly end up into a paradox. Although we cannot forecast wich sequence a protein will bind if we analyze the sequence only, we can expect that this sequence will be under selective pressure because the binding activity may be essential for the organism. Besides, in many cases the binding process is very specific. A transcriptional factor can recognize and bind one few baseparis sequence (and only one) among a huge amount of similar sequences. Proteins and nucleic acids talk in someway. I always wonder if it will be possible to decrypt the language they use for that.
Protein- DNA Docking
A starting point to achieve this could be the protein- DNA docking. Docking is a set of algorithms developed to determine the possible way a small MW ligand can bind a protein. It is the pivotal technique in drug design. More recently, the same principles of docking are starting to be applied to protein- protein complexes and protein- DNA complexes. With nucleic acids, the real problem is their intrinsic flexibility. It’s a well known fact that if you want to make a rigid- body simulation on a very flexible structure, you could have more than a problem.
The most used protein- protein docking algorithm is HADDOCK (High Ambiguity Driven biomolecular DOCKing). Developed by Alexandre Bonvin at Utrecht University, it is based on the concept of Ambiguous Interaction Restraints (AIR), that are calculated combining several structural and sequence informations experimentally achieved. The key- idea is to set up an algorithm able to calculate a possible overlap even considering a very complex interation.
You can understand this algorithm in detail from the original paper.
An effort to adjust HADDOCK to protein- DNA complexes has been carried on by Marc van Dijk’s group at Utrecht University in Nederlands. In their paper published on Nucleic Acid Research, the dutch researchers set up a new docking approach starting from the data collected on the monomeric repressor−DNA complexes formed by bacteriophage 434 Cro, the Escherichia coli Lac headpiece and bacteriophage P22 Arc.
The day when we’ll be able to download a PDB file and predict the exact site where a protein will bind the DNA is still far, but the very first steps in the world of interactions between proteins and DNA are being moved in bioinformatics too.