TRAPLINE: A standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation

Markus Wolfien, Christian Rimmbach, Ulf Schmitz, Julia Jeannine Jung, Stefan Krebs, Gustav Steinhoff, Robert David, and Olaf Wolkenhauer
Department of Systems Biology and Bioinformatics, University of Rostock, 18057 Rostock, Germany
Reference und Translation Center for Cardiac Stem Cell Therapy (RTC), University of Rostock, 18057 Rostock, Germany
Gene Center Munich, LMU Munich, 81377 Munich, Germany
Stellenbosch Institute of Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, 7602 Stellenbosch, South Africa
Correspondence to:                                                                                reference at:

We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis and data evaluation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data analysis. A comparative transcriptomics analysis with TRAPLINE results in a set of differentially expressed genes, their corresponding protein-protein interactions, a analysis of differential splicing and promoter testing and an integrated miRNA target prediction. Ultimately, the user will receive a ready-to-use file which can be importet to Cytoscape.

TRAPLINE supports NGS research by providing a workflow that requires no bioinformatics skills and decreases the processing time of the analysis. We also support the analysis of paired-end RNA sequencing data. The adapted TRAPLINE workflow can be obtained via:

Our pipeline is implemented in the biomedical research platform Galaxy and is freely accessible via:

Galaxy Workflow | RNAseqTRAPLINE

RNA-sequencing data analysis in a Transparent Reproducible and Automated PipeLINE - TRAPLINE.

 Step by Step instructions for the usage:

o   Do your experiments (Illumina, SOLiD, Solexa Sequencing) and obtain the FASTQ files

  Note: the analysis is predefined for the comparison of two experimental conditions with a triplicate for each experimental setup

o   Go to the Galaxy website

o   If you are new to Galaxy please create an account 

o   Import our developed analysis workflow TRAPLINE through or use the “Shared Data – Published Workflows“ section of Galaxy

Galaxy Workflow | RNAseqTRAPLINE

RNA-sequencing data analysis in a Transparent Reproducible and Automated PipeLINE - TRAPLINE.

o   (Optional): Edit the settings or parameters, especially if you want to use less replicates than 3 please adjust the workflow

o   Upload your FASTQ datasets (6 slots are predefined, 2 conditions with 3 replicates per condition)

  •   Choose format “fastqsanger” for uploading your data (use the “Get data” icon on the left site)

  You have two possibilities for uploading your data:

o   Direct upload from your hard drive

o   Upload data from a FTP server

o   Upload a reference annotation set for your species as a .gtf file (here: mm9) and assign it to the “Reference annotation” input file of the workflow.

  The latest version of your specific species can be obtained via as gtf annotation file

o   (Optional): Upload a miRNA target file for your species of interest and assign it to the “miRNA target prediction” input file of the workflow.

  We provide formatted ready to use miRNA target prediction files for human, mice, rat, fruitfly and nematode based on (Betel et al., 2010).

Galaxy History | TRAPLINE: miRNA Targets Input

This history includes the optionally miRNA target prediction files of TRAPLINE.

o   (Optional): Upload a protein interaction file for your species of interest and assign it to the “Protein interaction” input file of the workflow.

  We provide several formatted and ready to use protein-protein interaction files based on BioGRID (Chatr-Aryamontri et al. 2015).

Galaxy History | TRAPLINE: Protein-Protein Interactions Input

This history includes the optionally protein-protein interaction files of TRAPLINE.

o   Go to the “Workflow” section, select “RNASeqTRAPLINE” and click on Run

o   Assign your six datasets to the given order (have a look at the annotation text) and choose your reference annotation file

o   Select a reference genome of species for each TopHat2 alignment as a Galaxy build-in (mice mm9 is predefined)

  We used the default TopHat2 parameter adjustments as recommended by Kim et al.(2013).

  The single end read mode is also predefined, but can be changed in the TopHat2 settings

  Moreover, Trapnell et al. (2012) recommended to avoid the use of genome reference annotation in the genome alignment step, because this step would prevent the identification of novel, yet uncharacterized, transcripts.

o   Start the workflow

o   Obtain your results

  A list of all genes and additional a list containing only the significantly differentially expressed genes

  A list of differential splice variants of each primary transcript

  A list of differential promoter use between the samples

  A list of significantly upregulated / downregulated genes

  Link to DAVID to further analyze the obtained significantly differentially expressed genes regarding their annotation and impact to the phenotype (Rerun module with identifiers in column 3)

A list of significantly up regulated / down regulated miRNAs including their predicted targets that are also significantly up regulated / down regulated

A list of protein-protein interactions based on up regulated mRNAs

A table containing all the obtained results and made ready for an import into Cytoscape, example file can be seen here:

TRAPLINE can be cited via:

Markus Wolfien, Christian Rimmbach, Ulf Schmitz, Julia Jeannine Jung, Stephan Krebs, Gustav Steinhoff, Robert David, Olaf Wolkenhauer (2016)
TRAPLINE: A standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation. BMC Bioinformatics. doi: 10.1186/s12859-015-0873-9

Betel, D., Koppal, A., Agius, P., Sander, C., and Leslie, C.
(2010). Comprehensive modeling of microRNA targets predicts functional
non-conserved and non-canonical sites. Genome Biol 11, R90.Chatr-Aryamontri, A., Breitkreutz, B.J., Oughtred, R., Boucher, L., Heinicke, S., Chen, D.,
Stark, C., Breitkreutz, A., Kolas, N., O'Donnell, L., et al. (2015). The BioGRID interaction
database: 2015 update. Nucleic Acids Res 43, D470-478.Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013).
TopHat2: accurate alignment of transcriptomes in the presence of insertions,
deletions and gene fusions. Genome Biology 14.Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D.,Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7, 562-578.