Web-based Pipelines for Integrated Tumor Genome Profiles Reveal Differences between Pancreatic Cancer Tumors and Cell Lines

Jeremy Goecks1, H. Jean Khoury2, Bassel F. El-Rayes2, Shishir K. Maithel3, The Galaxy Team4, James Taylor5, and Michael R. Rossi6
Correspondence should be addressed to JG

This page provides access to the analysis pipelines/workflows discussed in this Cancer Medicine publication, including the analysis histories for the cancer cell line data discussed in the paper. Please send any questions to Jeremy.

Videos for Getting Started

Here are some videos that will help you get started using Galaxy and using these pipelines. You can use these workflows/pipelines with your own data or rerun the analyses in the paper using the cancer cell line data from the paper.

How to use this page

Galaxy

Analyzing tumor data using the pipelines/workflow

Workflows

The workflows described in the manuscript are listed below along with some helper workflows.

(1) This is the basic tumor exome analysis workflow that calls variants from targeted exome resequencing data:

(2) This is the RNA-seq analysis workflow. This workflow analyzes tumor RNA-seq data to find small variants, gene fusions, and quantify gene expression:

(3) This is the integrated variant analysis workflow. To use this workflow, two datasets in the same history are needed: (a) a variants dataset from either the exome or transcriptome analysis workflow) (b) Cufflinks Gene Expression dataset. This workflow then identifies:

  • deleterious variants
  • deleterious and druggable variants
  • deleterious variants in highly-expressed genes
  • deleterious and druggable variants in highly-expressed genes

(4) This is an extended workflow for use when only a tumor exome is available. Starting with tumor exome sequencing data, it identifies deleterious variants and druggable variants:

(5) VCF Variant recovery. Use this workflow to obtain a list of variants in VCF format from a ANNOVAR table of variants. Variants in VCF format are useful for visualization.

(6) Workflow to convert Tophat-fusion-post results to chrint format, which can be used to visualization fusion in Circster:

To use these workflows on a Galaxy instance other than this one, take these steps:

  1. As an admin user, download the workflows that you want to use and follow the prompts to install needed tools. Here is more explanation on installing tools needed for workflows.
  2. Download and install ANNOVAR (no automatic installation is possible due to ANNOVAR's licensing): 
  3. Install necessary data indices for your genome for Tophat, Tophat fusion, ANNOVAR. Here are instructions for installing needed indices from the command line; Galaxy data managers will be making this process easier in the near future.

Analysis Histories for Cell Line Data

  Using the first three workflows above, here are the analysis histories for the three pancreatic cancer cell lines, Mia PaCa2, HPAC, and PANC-1.

Mia PaCa2 Exome:

Mia PaCa2 Transcriptome:

Mia PaCa2 Integrated Variant Analysis:

HPAC Exome:

HPAC Transcriptome:

HPAC Integrated Variant Analysis:

PANC-1 Exome:

PANC-1 Transcriptome:

PANC-1 Integrated Variant Analysis:

Author details

1Computational Biology Institute, George Washington University

2Department of Hematology and Medical Oncology, School of Medicine, Emory University

3Department of Surgery, Division of Surgical Oncology, Emory University

4http://galaxyproject.org

5Department of Biology, Johns Hopkins University

6Department of Radiation Oncology, School of Medicine, Emory University