Biological data analysis has become an essential part of modern biomedical research that is enabling a large range of biological and medical studies. We have developed a novel approach to facilitate biological data analysis. This approach takes advantage of Language Workbench (LW) technology and in particular of the open-source Meta Programming System (MPS).
The details of the approach will be presented in a manuscript under preparation, but the simplest way to understand the potential is to watch a short video. The analysis workbench is extremely interactive.
The NYoSh workbench is very much unlike other tools that you might be familiar with. Like most new things, you will need a bit of time to become familiar with the tool (however, after a short training you will become much more productive than with other approaches). The video above is designed to give you an idea of the level of interactivity and user assistance that the workbench can provide. Please bear in mind that the tool is still a research prototype under active development. Training videos will also be posted here in the future and focus on teaching new users how to use the workbench effectively for common biological analyses.
We focused our initial developments on the following analysis problems:
- Computationally intensive analysis of high-throughput sequencing sequencing data including RNA-Seq, DNA methylation and genomic data (e.g., somatic variation or trio analysis).
- Interactive analysis of tabular data to produce heatmaps and other visualizations. This type of analysis is supported with metaR. See the video Tutorials on the metaR pages.
- Development of custom parallel analysis workflow. Custom workflow development is supported with the NextflowWorkbench.
- Development of predictive biomarker models in high-throughput datasets, see BDVal for MPS.
The fact that these three vastly different problems can be addressed effectively with Language Workbench technology demonstrates the versatility and generality of the technology for data analysis.
We offer training sessions that cover:
From reads to tables. These sessions describe how to analyze high-throughput sequencing data (reads) to produce tables of statistics in TSV or VCF format (e.g., gene expression, methylation levels, genotypes, somatic variation calls, associated statistics of differences between groups, etc.). (Important requirement: you must have an WCMC ITS tagged laptop to be able to connect to the cluster used during the training session). Use this registration form to register to the reads to tables session. Note that you can also perform such analysis tasks with GobyWeb, which only requires that you can use a web browser, see this comparison between NYoSh and GobyWeb.
From tables to heatmaps. These sessions describe how to transform data in table format into heatmaps and other visualizations useful for interactive data analysis. You will learn how to join tables to combine data from multiple tables, filter the tables on columns and create visualizations such as heatmaps, scatter plots and boxplots. No prior programming experience is necessary or assumed. Use this registration form to register to the tables to heatmap session.
The NYoSh Analysis Workbench is available for download for Mac and Linux platforms.
On Linux, unzip the distribution and run the distribution/mps.sh file. This should start the workbench.
On Mac, unzip the distribution and copy the folder to /Applications. Run the application by double clicking on the application folder. If you get a message that the application “is damaged” (see below), you are running OS X 10.7.5+ and need to enable running of unsigned applications. See these detailed instructions (see also this Apple support article for background).
Starting from version 2.0, the NyoSh workbench works in collaboration with an Apache Active MQ instance to collect information about jobs executed in a distributed OGE cluster.
See Required Software section below.
The following PDF tutorials are available. We will post more tutorials here as we develop them.
Follow Development on GitHub
The workbench is licensed under the open-source Apache 2.0 license. We encourage you to obtain the NYoSh code from GitHub:
git clone firstname.lastname@example.org:CampagneLaboratory/NYoSh.git
The following link contains a customized version of Active MQ:
After downloading and decompressing the Apache Active MQ archive, go in the download folder and run:
This starts an instance on the default port (5672).