Aye-aye population genomic analyses highlight an important center of endemism in northern Madagascar

George H. Perry, Edward E. Louis Jr, Aakrosh Ratan, Oscar C. Bedoya-Reina, Richard Burhans, Runhua Lei, Steig E. Johnson, Stephan C. Schuster, Webb Miller

Summary

Analyses of population-level genome sequence data offer potentially powerful demographic and evolutionary insights that could benefit conservation and ecological research on endangered species. We performed a population genomics study of the aye-aye, a highly specialized nocturnal lemur from Madagascar. Aye-ayes have low population densities and extensive range requirements that could make this flagship species particularly susceptible to extinction. Therefore, knowledge of genetic diversity and differentiation among aye-aye populations is critical for conservation planning. Such information may also advance our general understanding of Malagasy biogeography, as aye-ayes have the largest species distribution of any lemur. We generated and analyzed whole genome sequence data for 12 aye-ayes from three regions of Madagascar (North, West, and East). We found that the North population is genetically distinct, with strong differentiation from other aye-ayes over relatively short geographic distances. In comparison, the average FST value between the North and East aye-aye populations – separated by only 248 km – is over 2.1 times greater than that observed between human Africans and Europeans. This finding is consistent with prior watershed- and climate-based hypotheses of a center of endemism in northern Madagascar. Together, these results suggest a strong and long-term biogeographical barrier to gene flow. Thus, the specific attention that should be directed towards preserving large, contiguous aye-aye habitats in northern Madagascar may also benefit the conservation of other distinct taxonomic units. To help facilitate future ecological- and conservation-motivated population genomics analyses by non-computational biologists, the analytical toolkit used in this study is available on the Galaxy website.

Data Sets

Many of the analyses reported in the paper were based on the six data sets given here. (You can also find them under Shared Data -> Data LIbraries -> Genome Diversity, then under aye-aye and human.)

The first data set contains 4,555,737 putative aye-aye SNPs, each recorded in a row with 59 columns.

The second data set contains 19,670 aye-aye "SAPs", i.e., Single Amino-acid Polymorphisms, including synonymous coding-region substitutions, each recorded in a row with 9 columns.

For estimating diversity within populations, we need to know which protein-coding positions are covered by sufficiently many reads that we should be able to identify any SNP. We put this information in a table having 8 columns.

For each of the above three tables for the aye-aye, we have closely analogous tables for matched data sets for 12 human individuals. The SNP table has 8,598,051 entries, each with 53 columns.

There are 50,475 human SAPs, each with 7 columns.

The table of adequately covered human coding intervals has 5 columns.

Workflows

The workflows contain commands for the main analyses reported in the body of the paper. The user is invited to modify the commands to compute more of the results described in the main paper and supplement. Many of the Galaxy tools used in these workflows can be found under "Genome Diversity" in the left panel on the Analyze Data page. A tutorial can be found under Example 4 on this page.

The first workflow creates the data for Figure 2, as well as a plot of distributions of coverage depth at each SNP for the 13 individuals (including North5, which we left out of other analyses because of low coverage). The workflow needs to be applied to the "aye-aye SNPs" data set as follows: (1) Under "Analyze Data" (in the black bar) create an empty history. (2) Under "Shared Data" -> "Published Pages", view this page. (3) Import the "aye-aye SNPs" data set ("+" in the green circle near the right of the green bar), then click on "return to the previous page". (4) Import the "aye-aye Figure 2" workflow, and click on "start using this workflow". (5) You will be taken to your Workflow page, which will have a workflow called "imported aye-aye Figure 2"; click on it and select "run". (6) You will be taken to a history that includes the aye-aye SNPs and the aye-aye Figure 2 workflow; scroll to the bottom of the workflow (middle panel) and press "Run workflow". The commands will take a couple of minutes to run.

The second workflow performs the basic analysis behind Figure 3, yielding Fst estimations for both North-vs-East aye-ayes and European-vs-African humans. It uses both the aye-aye and human SNP data sets.

The final workflow estimates the diversity parameter pi for North aye-ayes and European humans. It uses all six of the data sets listed above.