Genome-wide analysis of signatures of selection in populations of African honey bees (Apis mellifera) using new web-based tool

Zachary L. Fuller1, Elina L. Niño2, Harland M. Patch2, Oscar C. Bedoya-Reina3, Tracey Baumgarten2, Elliud Muli4, Fiona Mumoki5, Aakrosh Ratan3, John McGraw6,Maryann Frazier2, Daniel Masiga5, Stephen Schuster3,Webb Miller3*, Christina M. Grozinger2*

1 Department of Biology, Pennsylvania State University, University Park, PA, USA

2 Department of Entomology, Center for Pollinator Research, Pennsylvania State University, University Park, PA, 

3 Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, USA

4 Department of Biological Sciences, South Eastern Kenya University (SEKU), P.O. Box 170-90200, Kitui, Kenya

5 The International Center of Insect Physiology and Ecology (icipe), PO Box 30772-00100, Nairobi, Kenya

6 Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA

ABSTRACT

With the development of inexpensive, high-throughput sequencing technologies, it has become feasible to examine questions related to population genetics and molecular evolution of non-model species in their ecological contexts on a genome-wide scale. Here, we employed a newly developed suite of integrated, web-based programs to examine population dynamics and signatures of selection across the genome using several well-established tests, including FST, pN/pS, and McDonald-Kreitman.  We applied these techniques to study populations of honey bees (Apis mellifera) in East Africa.  In Kenya, there are several described A. mellifera subspecies, which are thought to be localized to distinct ecological regions. We performed whole genome sequencing of 11 worker honey bees from apiaries distributed throughout Kenya and identified 3.6 million putative single-nucleotide polymorphisms. The dense coverage allowed us to apply several computational procedures to study population structure and the evolutionary relationships among the populations, and to detect signs of adaptive evolution across the genome.  While there is considerable gene flow among the sampled populations, there are clear distinctions between populations from the northern desert region and those from the temperate, savannah region. We identified several genes under positive selection within African bee populations, and between these populations and European A. mellifera or Asian Apis florea.  These genes were associated with several fundamental processes, including behavior, response to stress, reproduction and metabolism.  These results lay the groundwork for future studies of adaptive ecological evolution in honey bees, and demonstrate the use of new, freely available web-based tools (http://galaxyproject.org/) that can be applied to any model system with genomic information.   

Data Sets

All SNPs: Dataset 'All putative honey-bee SNPs'  This is a tab-separated file with the following columns:

    1. scaf  scaffold in the bee assembly (vers. 4.5)
    2. pos   position on that scaffold
    3. A     reference allele
    4. B     variant allele
    5. qual  overall SNP quality

For each of 11 samples there are four columns, giving number of reads with the
first allele, number with the second allele, genotype (i.e., count of the
first allele; 0, 1 or 2), and quality of the called genotype. The values occupy
columns 6-49.

       name ID   sample   sub-species
       ---- ---  -------  -----------
  6-9.  1S  989   1.4.15  scutellata
10-13.  2S  990   2.2.15  scutellata
14-17.  3S  991   4.2.15  scutellata
18-21.  1C  992  12.2.15  litorea
22-25.  2C  993  13.4.15  scutellata
26-29.  3C  994  15.4.15  scutellata
30-33.  1D  995  16.1.5   yemenitica
34-37.  2D  996  17.1.5   yemenitica/litorea
38-41.  3D  997  18.1.15  yemenitica
42-45.  4S  998  21.3.15  scutellata
46-49.  1M  999  22.2.5   monticola

Filtered set of SNPs (quality >= 100): Dataset 'Filtered honey-bee SNPs' Same format as the previous file.

Protein-coding SNPs:  Dataset 'honey-bee SAPs'  This is a tab-separated file with the following columns:

 1. ref  - scaffold in the bee assembly (vers. 4.5)
 2. rPos - position on bee scaffold
 3. gene - gene name
 4. AA1  - one amino acid
 5. loc  - location in the gene sequence
 6. AA2  - variant amino acid

Genes:  Dataset 'honey-bee genes'  This is a tab-separated file with the following columns:

 1. gene name
 2. group in the Amel4.5 assembly
 3. first location in the group
 4. last location in the group
 5. direction of transcription
 6. number of coding bases
 7. within-species dN/dS
 8. Mcdonald-Kreitman ratio
 9. number of within-species nonsynonymous polymorphisms
10. number of within-species synonymous polymorphisms
11. number of fixed between-species nonsynonymous differences
12. number of fixed between-species synonymous differences

.

Command Histories

Test 1: pN/pS:  History 'Kenyan bee pN/pS'

Test 2: Fixed differences from the (European) reference genome:  History 'Kenyan bee fixed differences from the reference'

Test 3: McDonald-Kreitman test:  History 'Kenyan bee McDonald-Kreitman test (vs. Apis florea)'

Test 4: Runs of homozygosity:  History 'Kenyan bee Runs of Homozygosity (ROH) for Desert'

Test 5: FST:  History 'Kenyan bee Desert and Savannah per-SNP FST'