Tag Archives: cancer

MethylMix: an R package for identifying DNA methylation-driven genes

The paper I am going to explore today introduces MethylMix, an R package designed to identify DNA methylation-driven genes. DNA methylation is one of the processes that are more extensively studied in biomedicine, since it has been found as a principal mechanism of gene regulation in many diseases. Although high-throughput methods are able to produce huge amounts of DNA methylation measurements, there are quite a few tools to formally identify hypo and hypermethylated genes.

This is the reason why Olivier Gevaert from Stanford proposed his MethylMix, an algorithm to identify disease-specific hyper- and hypo-methylated genes, published online yesterday on Oxford Bioinformatics.

The key idea of this work is that it is not possible to lean on an arbitrary threshold to determine the differential methylation of a gene, and the assessment of differential methylation has to be made in comparison to normal tissue. Moreover, the identification of differentially methylated genes must come along with a transcriptionally predictive effect, thus implying a functional relevance of methylation.

MethylMix first calculates a set of possible methylation states for each CpG site that is found to be associated with genes showing differential expression. This set is created by comparison with clinical samples and using the Bayesian Information Criterion (BIC). Then, a normal methylation state is defined as the mean DNA-methylation level in normal tissue samples. Each set is compared with the normal methylation state in order to calculate the Differential Methylation Value or DM- value, defined as the difference between the methylation state with the mean DNA-methylation in control samples. The output is thus an indication of which genes are differentially methylated and differentially expressed.

As mentioned, the algorithm is implemented as an R package, it’s already included in the Bioconductor package section.

The Oncodrive suite. Bionformatics methods to detect driver mutations in cancer.

One of the most amazing groups whose work I have recently explored, is based in the rapidly- growing young UPF university in Barcelona. The Biomedical Genomics Group applies its high computational expertise to cancer research, focusing on the identification of those mutations that are actually involved in determining the tumor phenotype, the so- called driver mutations. The tool I share with you today is aimed at the identification of driver mutations using a clustering approach. The idea is quite simple: since gain of function mutations in cancer use to cluster in specific protein regions, thus providing an adaptive advantage to cancer cells, one can use this feature to identify a driver mutation. This is a crucial need for anyone working in cancer genomics. As you sequence the genome of a cancer cell, you basically find a total mess of mutations, and your job is to distinguish the ones that determine cancer.

One of the current challenges of oncogenomics is to distinguish the genomic alterations that are involved in tumourigenesis (i.e. drivers), from those that give no advantage to cancer cells, but occur stochastically as a by-product of cancer development. (Bioinformatics, 2013)

The lab published a set of tools, actually a real software suite called Oncodrive, to provide a computational method to the identification of cancer mutations. On august the 27th 2014, the group announced the publication of a new member of this suite: OncodriveROLE, and I take this to publish a short resume of the whole suite.




Method to identify cancer drivers from cancer somatic mutations in a cohort of tumors. It computes the bias towards the accumulation of variants with high functional impact (FM bias).

link | paper


Method to identify genes that accumulate copy number alterations important for tumour development. This is done by computing the functional impact of CNAs by measuring their effect on the expression of the genes affected.

link | paper


Method to identify genes in which mutations accumulate within specific regions of the protein, which denote events selected by the tumour. It computes a score measuring the mutation clustering of a gene across the protein sequence and compares it with a background model.

link | paper


Method to classify cancer driver genes into to Activating or Loss of Function roles.

link | paper

I haven’t tried them since I am working on dystrophy and still have no mutations to detect, but if I got this straight, all the scripts come out as python libraries. Moreover, I really suggest you to visit the lab’s page for tools to find out up to 13 different cancer- dedicated software solutions available for the use.