Following up on ideas mentioned in the previous post, we continued to experiment with Language Workbench technology for data analysis.
Our third prototype is MetaR, a simple language for biological data analysis. As a first goal for this language, I wanted to make it simpler to create heatmaps from tables of read counts. This is one of the top requests we get from people who use GobyWeb to call differentially expressed genes in RNA-Seq. Can GobyWeb generate heatmaps? My answer had been that there are many tools to do this, and since we don’t like to reinvent the wheel, I would just provide pointers to R/Bioconductor packages (such as pheatmap) or applications such as GENE-E, or the more recent HeatMapViewer web-based viewer.
However, I had a couple of heatmaps to produce for projects recently, and using R/Bioconductor quickly became tedious, not to mention error prone when projects had tens of samples and I needed to track how each of these samples mapped to a number of analysis covariates. I am not a fan of user interfaces for data analysis, because I tend to redo analyses often when new data is added to a project, and rebuilding visualizations for each update takes as much time as building the first plot. This is not the case with scripts, because once they are developed, you just need to execute them again after updating the data to produce the new figures. For these reasons, I wanted a better way to build heatmaps, that would be faster than writing R code, but still retain the write once, run many times advantage of analysis scripts.
Knowing a few things about MPS, and reusing a few languages we have been building, such as TextOutput to simplify generation of R code, or XChart representations of TSV files, I was able to quickly put together a prototype that allowed me to generate R code using a few high level abstractions (e.g., Tables, Column groups and Column group usage, Analysis script). The language quickly became very useful. Manuele joined the development and added support for table previews in the editor, typed columns and contributed many fixes and polishing. After some brainstorming, we decided to call the language MetaR (it brings meta programming to the R language). MetaR analyses can be put under version control, offer auto-completion, are much shorter and simpler than equivalent R scripts (mostly because a lot of configuration that R programmers need to write can be generated automatically, using a few conventions and taking advantage of the structured Table and Column Group concepts). These ideas are simple, but work remarkably well to simplify the production of heatmaps and other visualizations. In practice, we find that MetaR analyses scripts are 5-10 times shorter than the R scripts that they generate.
At some point, I started using MetaR to put together figures with multiple panels for a manuscript. To do this, I extended the language with a multi-plot statement. This statement helps arrange plots produced by other statements into a matrix of n rows and m columns. You can organize the order of the plots in the matrix by entering references to these plots, and after you run the script, you can get a preview of the resulting multi-plot. This is much more intuitive to use than the lower-level layout feature of R that this statement generates.
You can see the language in action in the following video (the second part of this tutorial is available here):
Notice how the edgeR language is added to the environment at runtime. This capability is possible because metaR supports seamless language composition (a key feature of the MPS Language Workbench). Many micro-languages can be developed to extend MetaR. For instance, you could create your own language to add one or more types of statements to MetaR. The statements you define would become available in the analyses where you import your language extension, and would be able to generate R code to produce an executable R script.