Genome-wide analysis of signatures of selection in populations of African honey bees (Apis mellifera) using new web-based tool
Zachary L. Fuller1, Elina L. Niño2, Harland M. Patch2, Oscar C. Bedoya-Reina3, Tracey Baumgarten2, Elliud Muli4, Fiona Mumoki5, John McGraw6,Maryann Frazier2, Daniel Masiga5, Stephen Schuster3,Webb Miller3*, Christina M. Grozinger2*
1 Department of Biology, Pennsylvania State University, University Park, PA, USA
2 Department of Entomology, Center for Pollinator Research, Pennsylvania State University, University Park, PA,
3 Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, USA
4 Department of Biological Sciences, South Eastern Kenya University (SEKU), P.O. Box 170-90200, Kitui, Kenya
5 The International Center of Insect Physiology and Ecology (icipe), PO Box 30772-00100, Nairobi, Kenya
6 Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
With the development of inexpensive, high-throughput sequencing technologies, it has become feasible to examine questions related to population genetics and molecular evolution of non-model species in their ecological contexts on a genome-wide scale. Here, we employed a newly developed suite of integrated, web-based programs to examine population dynamics and signatures of selection across the genome using several well-established tests, including FST, pN/pS, and McDonald-Kreitman. We applied these techniques to study populations of honey bees (Apis mellifera) in East Africa. In Kenya, there are several described A. mellifera subspecies, which are thought to be localized to distinct ecological regions. We performed whole genome sequencing of 11 worker honey bees from apiaries distributed throughout Kenya and identified . million putative single-nucleotide polymorphisms. The dense coverage allowed us to apply several computational procedures to study evolutionary relationships among the populations, and detect signs of adaptive evolution across the genome. While there is considerable gene flow among the sampled populations, there are clear distinctions between populations from the northern desert region and those from the temperate, savannah region. We identified several genes under positive selection within African bee populations, and between these populations and European A. mellifera or Asian Apis florea. These genes were associated with several fundamental processes, including behavior, response to stress, reproduction and metabolism. These results lay the groundwork for future studies of adaptive ecological evolution in honey bees, and demonstrate the use of new, freely available web-based tools (http://galaxyproject.org/) that can be applied to any model system with genomic information.
All SNPs: Dataset 'All putative honey-bee SNPs' This is a tab-separated file with the following columns:
1. scaf scaffold in the bee assembly (vers. 4.5) 2. pos position on that scaffold 3. A reference allele 4. B variant allele 5. qual overall SNP quality For each of 11 samples there are four columns, giving number of reads with the first allele, number with the second allele, genotype (i.e., count of the first allele; 0, 1 or 2), and quality of the called genotype. The values occupy columns 6-49. name ID sample sub-species ---- --- ------- ----------- 6-9. 1S 989 1.4.15 scutellata 10-13. 2S 990 2.2.15 scutellata 14-17. 3S 991 4.2.15 scutellata 18-21. 1C 992 12.2.15 litorea 22-25. 2C 993 13.4.15 scutellata 26-29. 3C 994 15.4.15 scutellata 30-33. 1D 995 16.1.5 yemenitica 34-37. 2D 996 17.1.5 yemenitica/litorea 38-41. 3D 997 18.1.15 yemenitica 42-45. 4S 998 21.3.15 scutellata 46-49. 1M 999 22.2.5 monticola
Filtered set of SNPs (quality >= 100): Dataset 'Filtered honey-bee SNPs' Same format as the previous file.
Protein-coding SNPs: Dataset 'honey-bee SAPs' This is a tab-separated file with the following columns:
1. ref - scaffold in the bee assembly (vers. 4.5) 2. rPos - position on bee scaffold 3. gene - gene name 4. AA1 - one amino acid 5. loc - location in the gene sequence 6. AA2 - variant amino acid
Genes: Dataset 'honey-bee genes' This is a tab-separated file with the following columns:
1. gene name 2. group in the Amel4.5 assembly 3. first location in the group 4. last location in the group 5. direction of transcription 6. number of coding bases 7. within-species dN/dS 8. Mcdonald-Kreitman ratio 9. number of within-species nonsynonymous polymorphisms 10. number of within-species synonymous polymorphisms 11. number of fixed between-species nonsynonymous differences 12. number of fixed between-species synonymous differences
Test 1: pN/pS: History 'Kenyan bee pN/pS'
Test 2: Fixed differences from the (European) reference genome: History 'Kenyan bee fixed differences from the reference'
Test 3: McDonald-Kreitman test: History 'Kenyan bee McDonald-Kreitman test (vs. Apis florea)'
Test 4: Runs of homozygosity: History 'Kenyan bee Runs of Homozygosity (ROH) for Desert'
Test 5: FST: History 'Kenyan bee Desert and Savannah per-SNP FST'