Published Pages | jeremy | Transcriptome Analysis FAQ

Frequently Asked Questions about Using Galaxy for Transcriptome Analysis

Table of Contents

1. What short-read mapper should I use to map my RNA-seq reads?

2. Why won't my SAM dataset work with Cufflinks?

3. My Tophat/Cufflinks jobs are failing with what(): std::bad_alloc. What is going on and how can I fix it?

4. My Cufflinks/compare/diff jobs are failing with 'Error: sequence lines in a FASTA record must have the same length!' What is going on and how can I fix it?

5. Why doesn't my Ensembl GTF work with Cufflinks and how can I use Ensembl GTFs with Cufflinks?

Questions and Answers

1. What short-read mapper should I use to map my RNA-seq reads?

Gapped/splice junction mappers are typically used to map RNA-seq reads across splice junctions. Galaxy provides the gapped mapper [NGS: RNA Analysis >Tophat for mapping RNA-seq reads. Using other mappers for RNA-seq data should be used with caution.

2. Why won't my SAM dataset work with Cufflinks?

A SAM dataset needs to be sorted correctly before it can be used with Cufflinks. Here's a workflow that you can use to sort SAM datasets so that they can be used with Cufflinks:

Galaxy Workflow | Sort SAM file for Cufflinks

Cufflinks requires that SAM files be sorted by chromosome and position. This workflow performs the sorting necessary for Cufflinks.

3. My Tophat/Cufflinks jobs are failing with what(): std::bad_alloc. What is going on and how can I fix it?

This error indicates that your dataset is too large to run on our computing cluster; Cufflinks requires a large amount of memory to run, and very large datasets of mapped reads require more memory than our cluster nodes have. You have two options: (a) you can run Galaxy locally on your own computer or computing cluster or (b) you can run Galaxy on the cloud.

4. My Cufflinks/compare/diff jobs are failing with 'Error: sequence lines in a FASTA record must have the same length!' What is going on and how can I fix it?

This error concerns the bias correction parameters: if you want to use bias correction, Cufflinks/compare/diff must have access to sequence data for the organism (i.e. its reference genome). For many organisms/builds, Galaxy already has this data; in this case, you'll need to set the dbkey/build for the datasets that you're using (by clicking on the pencil icon next to each dataset and setting the dbkey) and Galaxy will automatically provide the sequence data to Cufflinks/compare/diff. If Galaxy does not have sequence data for your organism, you can provide it by setting the source for the reference list to 'history' and choosing the appropriate dataset.

Finally, you can always turn off bias correction and no sequence data will need to be provided to Cufflinks/compare/diff.

5. Why isn't my Ensembl GTF compatible with Cufflinks and how can I use Ensembl GTFs with Cufflinks?

Galaxy's biological data is obtained from UCSC and is not natively compatible with Ensembl gene annotations because of a difference in naming conventions. UCSC and Galaxy name chromosomes with the convention chr1, chr2, ... ; Ensemble names chromosomes 1, 2, ...

You can make an Ensemble gene annotation compatible with Galaxy's Cufflinks by using this workflow to modify the Ensembl gene annotation to be compatible with Cufflinks:

Galaxy Workflow | Make Ensembl GTF compatible with Cufflinks

Converts an Ensembl gene annotation file so that it can be used with Cufflinks/compare/diff.

The End