The paper I am going to explore today introduces MethylMix, an R package designed to identify DNA methylation-driven genes. DNA methylation is one of the processes that are more extensively studied in biomedicine, since it has been found as a principal mechanism of gene regulation in many diseases. Although high-throughput methods are able to produce huge amounts of DNA methylation measurements, there are quite a few tools to formally identify hypo and hypermethylated genes.
This is the reason why Olivier Gevaert from Stanford proposed his MethylMix, an algorithm to identify disease-specific hyper- and hypo-methylated genes, published online yesterday on Oxford Bioinformatics.
The key idea of this work is that it is not possible to lean on an arbitrary threshold to determine the differential methylation of a gene, and the assessment of differential methylation has to be made in comparison to normal tissue. Moreover, the identification of differentially methylated genes must come along with a transcriptionally predictive effect, thus implying a functional relevance of methylation.
MethylMix first calculates a set of possible methylation states for each CpG site that is found to be associated with genes showing differential expression. This set is created by comparison with clinical samples and using the Bayesian Information Criterion (BIC). Then, a normal methylation state is defined as the mean DNA-methylation level in normal tissue samples. Each set is compared with the normal methylation state in order to calculate the Differential Methylation Value or DM- value, defined as the difference between the methylation state with the mean DNA-methylation in control samples. The output is thus an indication of which genes are differentially methylated and differentially expressed.
As mentioned, the algorithm is implemented as an R package, it’s already included in the Bioconductor package section.