Published Pages | galaxyproject | Galaxy Variant 101: Introduction to Polymorphism Detection via Variant Analysis

Galaxy Variant 101: Introduction to Polymorphism Detection via Variant Analysis 

Heteroplasmy: Mother-Child mtDNA Variant Polymorphism

• heteroplasmy  • ismb2010-demo

This tutorial will teach to you how to ...

  • • Import Sequence data from a Page.
  • • Interpret FASTQ to confirm datatype and estimate sequencing error rate
  • • Execute a series of tools (mapping, filtering for matched pairs, adding read groups) to map sequence to prepare them for variant analysis tools
  • • Execute one commonly used variant analysis tool and two Galaxy developed variant analysis tools.
  • • Filter key results.

What is Heteroplasmy ?

The heteroplasmic condition for humans is when an individual contains multiple, replicated, versions of extra-nuclear genomic DNA, such as that of mitochondria (mtDNA). Everyone is mtDNA heteroplasmic, but the rate of heteroplasmy and the specific locations of SNPs varies by individual. Several disease conditions are associated with heteroplasmy and there is active research in the field exploring the relationship between heteroplasmy and aging. 

Being heteroplasmic is not the same as being chimeric. Do you know why? What are the mechanisms of inheritance for each condition?

• Go ahead, start with Wikipedia to learn more: http://en.wikipedia.org/wiki/Heteroplasmy

Experiment breakdown

Import child and mother .fastq datasets, review metadata for accuracy. Run FASTQC on datasets and confirm Illumina .fastqsanger format and note Sanger PHRED+33 quality score mean and range (*where Q20 indicating a sequence error rate of ~ 1.0%, or 1/100. Q10 is ~10.0% or 1/10, Q30 is ~0.1% or 1.1000). Map using BWA then Filter using SAMTools for properly mapped pairs only. Convert the resulting SAM file to BAM, add in read groups with Picard, then merge the two input datasets with Concatenate (child and mother). Execute the variant analysis tools Naive Variant Caller and FreeBayes. Note that FreeBayes at defaults will miss low-frequency variants, but that Naive Variant Caller reports all polymorphic sites (SNPs). Filter the Naive Variant Caller results with the tool Variant Annotator using the estimated sequencing error rate of 1.0%, then focus on polymorphisms present in the population >= 0.02 (2%).

• Explore the results. These are the native, potentially significant, locations of SNPs found both in and between the populations of mitochondria from the mother and child.

• Questions:

  •      • Can you identify highly polymorphic SNPs? How could you compare the mean rate of variation with these?
  •      • Can you identify different rates of polymorphism for any SNPs? What would cause mother and child to vary? 
  •         How could this rate of change be applied?
  •      • Are any SNPs or group of SNPs known in the public databanks? Associated with a disease? 
  •         Do any polymorphic rates significantly indicate a risk (can this answered)?
  •      • How would you locate publications on this topic? How does this small sample compare with other published rates of polymorphic 
  •         mtDNA (per individual, between mother and child)?

* Source at Illumina: http://www.illumina.com/truseq/quality_101/quality_scores.ilmn

Input NGS Datasets

Child

Galaxy Dataset | raw_child-ds-1.fq

uploaded fastqsanger file

Galaxy Dataset | raw_child-ds-2.fq

uploaded fastqsanger file
Mother 

Galaxy Dataset | raw_mother-ds-1.fq

uploaded fastqsanger file

Galaxy Dataset | raw_mother-ds-2.fq

uploaded fastqsanger file

.

Workflow

.

Completed History 

Galaxy History | Galaxy Variant 101

Mother-Child mitochrondrial variation analysis. See Page https://usegalaxy.org/u/galaxyproject/p/galaxy-101-ngs-variant

.