If you have a dataset in your history that is not appearing in the drop-down selector for a tool, the most common reason is that it has the wrong format. Each Galaxy dataset has an associated file format recorded in its metadata, and tools will only list datasets from your history that have a format compatible with that particular tool. Of course some of these datasets might not actually contain relevant data, or even the correct columns needed by the tool, but filtering by format at least makes the list to select from a bit shorter.
Some of the formats are defined hierarchically, going from very general ones like Tabular (which includes any text file with tab-separated columns), to more restrictive sub-formats like Interval (where three of the columns must be the chromosome, start position, and end position), and on to even more specific ones such as BED that have additional requirements. So for example if a tool's required input format is Tabular, then all of your history items whose format is recorded as Tabular will be listed, along with those in all sub-formats that also qualify as Tabular (Interval, BED, GFF, etc.).
There are two usual methods for changing a dataset's format in Galaxy: if the file contents are already in the required format but the metadata is wrong (perhaps because the Auto-detect feature of the Upload File tool guessed it incorrectly), you can fix the metadata manually by clicking on the pencil icon beside that dataset in your history. Or, if the file contents really are in a different format, Galaxy provides a number of format conversion tools (e.g. in the Text Manipulation and Convert Formats categories). For instance, if the tool you want to run requires Tabular but your columns are delimited by spaces or commas, you can use the "Convert delimiters to TAB" tool under Text Manipulation to reformat your data. However if your files are in a completely unsupported format, then you need to convert them yourself before uploading.
This is one of the ABIF family of binary sequence formats from
Applied Biosystems Inc.
Files should have a '.ab1
' file extension. You must
manually select this file format when uploading the file.
Used for pairwise alignment output from BLASTZ, after post-processing. Each alignment block contains three lines: a summary line and two sequence lines. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size information about the alignment, and consists of nine required fields. More information
A binary alignment file compressed in the BGZF format with a
'.bam
' file extension.
SAM
is the human-readable text version of this format.
Example:
chr22 1000 5000 cloneA 960 + 1000 5000 0 2 567,488, 0,3512 chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601
A zipped archive consisting of binary sequence files in either AB1
or SCF format. All files in this archive must have the same file
extension which is one of '.ab1
' or '.scf
'.
You must manually select this file format when uploading the file.
A sequence in
FASTA
format consists of a single-line description, followed by lines of
sequence data. The first character of the description line is a
greater-than ('>
') symbol. All lines should be
shorter than 80 characters.
>sequence1 atgcgtttgcgtgc gtcggtttcgttgc >sequence2 tttcgtgcgtatag tggcgcggtga
FastqSolexa is the Illumina (Solexa) variant of the FASTQ format, which stores sequences and quality scores in a single file.
@seq1 GACAGCTTGGTTTTTAGTGAGTTGTTCCTTTCTTT +seq1 hhhhhhhhhhhhhhhhhhhhhhhhhhPW@hhhhhh @seq2 GCAATGACGGCAGCAATAAACTCAACAGGTGCTGG +seq2 hhhhhhhhhhhhhhYhhahhhhWhAhFhSIJGChOOr
@seq1 GAATTGATCAGGACATAGGACAACTGTAGGCACCAT +seq1 40 40 40 40 35 40 40 40 25 40 40 26 40 9 33 11 40 35 17 40 40 33 40 7 9 15 3 22 15 30 11 17 9 4 9 4 @seq2 GAGTTCTCGTCGCCTGTAGGCACCATCAATCGTATG +seq2 40 15 40 17 6 36 40 40 40 25 40 9 35 33 40 14 14 18 15 17 19 28 31 4 24 18 27 14 15 18 2 8 12 8 11 9
Also known as the FBAT format, for use with the FBAT program. It consists of a pedigree file and a phenotype file.
This format is a tabular file with the first column being the column number (1 based) from the gd_snp file where the individual/group starts. The second column is the label from the metadata for the individual/group. The third is an alias or blank.
This is a tabular file describing single amino-acid polymorphisms (SAPs). You must manually select this file format when uploading the file.
This is a tabular file describing SNPs in individuals or populations. It contains the zero-based position of the SNP but not the range required by BED or interval so can not be used in Genomic Operations without adding an column for the end position. You must manually select this file format when uploading the file. Field specifications
This format is an HTML web page. Click the eye icon next to the dataset to view it in your browser.
Required fields:
+
' or
'-
'.
#CHROM START END STRAND NAME COMMENT chr1 10 100 + exon myExon chrX 1000 10050 - gene myGene
LAV
is the raw pairwise alignment format that is output by BLASTZ. The
first line begins with #:lav
.
This is the linkage pedigree format, which consists of separate MAP and PED
files. Together these files describe SNPs; the map file contains the position
and an identifier for the SNP, while the pedigree file has the alleles. To
upload this format into Galaxy, do not use Auto-detect for the file format;
instead select lped
. You will then be given two sections for
uploading files, one for the pedigree file and one for the map file. For more
information, see
linkage pedigree,
MAP,
and/or PED.
MAF is the multi-sequence alignment format that is output by TBA
and Multiz. The first line begins with '##maf
'. This
word is followed by whitespace-separated "variable=
value"
pairs. There should be no whitespace surrounding the '=
'.
MasterVar is a tab delimited text format with specified fields developed by the Complete Genomics life sciences company. Field specifications.
This is the binary version of the LPED format.
This is the personal genome SNP format used by UCSC. It is a BED-like format with columns chosen for the specialized display in the browser for personal genomes. Field specifications. Galaxy treats it the same as an interval file.
PSL format is used for alignments returned by BLAT. It does not include any sequence.
This is a binary sequence format originally designed for the Staden
sequence handling software package. Files should have a
'.scf
' file extension. You must manually select this
file format when uploading the file.
More information
This is a binary sequence format used by the Roche 454 GS FLX
sequencing machine, and is documented on p. 528 of their
software manual. Files should have a '.sff
' file
extension.
Text data separated into columns by something other than tabs.
One or more columns of text data separated by tabs.
A zipped archive consisting of flat text sequence files. All files
in this archive must have the same file extension of
'.txt
'. You must manually select this file format when
uploading the file.
Variant Call Format (VCF) is a tab delimited text file with specified fields. It was developed by the 1000 Genomes Project. Field specifications.
Wiggle tracks are typically used to display per-nucleotide scores in a genome browser. The Wiggle format for custom tracks is line-oriented, and the wiggle data is preceded by a track definition line that specifies which of three different types is being used. More information
Similar to the linkage pedigree format (lped).
Any text file.