Genotyping for population and association genetics
How does genetic diversity, clonal vigor, and sex ratio compare among populations inhabiting environments with varying degrees of water stress? Could tradeoffs between DT and sexual reproduction have shaped these diversity patterns?
Genome-wide genetic variation will be characterized for each sample using a restriction-site associated DNA (RAD)/Genotype by Sequence (GBS) genome complexity reduction protocol. This approach has the advantage of combining sequencing and genome-wide SNP genotyping in a single step, minimizing the issue of ascertainment bias (Hohenlohe et al. 2010; Helyar et al. 2011). Although RAD/GBS genotyping can be achieved without the availability of a reference genome (Elshire et al., 2011), we will take advantage of the two reference genomes to determine the specific restriction enzyme (or enzymes; Poland et al., 2012) to employ to optimize target coverage in both species with a single library of primers. Such a strategy will minimize the issue of missing data inherent in RAD/GBS genotyping and will also provide a functional component to the SNP data generated (alignment within the gene models used in the annotation of the genomes).
(1) We will select representative individuals from each population under investigation (as described elsewhere in the proposal) for isolation of high quality RNA/protein-free DNA for construction of the individually barcoded RAD/GBS libraries.
(2) After sequencing, sequence reads will be sorted by individual barcode and aligned to the Syntrichia genomes after removal of restriction site and barcode sequences.
(3) Putative SNP loci will be identified based on consensus sequences between individuals within and between populations, and further validated based on frequency, read depth, and genomic placement (i.e., not within an identified microsatellite region, no more than one other SNP within consensus sequence).
(4) Read alignment depths and allele frequencies at each SNP locus will be calculated within each population/site and across all populations.
In this manner we will investigate population structure as well as associate genotypes with specific phenotypes. These phenotypes based on physiology, sex, or response to the environment (Desiccation tolerance and temperature) will link the transcriptomes to the SNP genotypes and aid in the identification of candidate genes through DEG mapping.
Association genetic analyses will be conducted using the strategies described by McKown et al. (2014), which were used to implicate genes underlying trait variation in populations of poplar (Populus). These data associating phenotypes with genetic variants will support the RAD-based SNP genotyping strategy and ultimately the ecophysiology and communities/ecosystem goals of the proposed study.
Population Genetic Analyses
Characterizing the genetic structure of S. caninervis and S. ruralis populations throughout their western North American range will provide the necessary context in which to evaluate the extent of genetic differentiation of this complex along ecophysiological and/or geographic dimensions. Understanding the distribution of genetic variation within these species will also help to inform the level at which phylogenetically independent lineages may exist, or the extent and direction of gene flow between populations. By characterizing genome-wide genetic variation for individuals collected from sites that capture the breadth of macro- and microenvironments inhabited by these species, we will address Question 2 in Table 1 (written above).
Sex ratios, spatial distribution of the sexes, genetic diversity, and differentiation in populations of S. caninervis and S. ruralis.
Alignment of individual sample sequences to reference male and female S. ruralis RAD/GBSseq data will allow us to determine the sex of each genotyped sample. This approach has already been used successfully in S. caninervis (Baughman et al., unpublished data), for which over 1000 male- and female-associated RAD/GBS sequences were identified in sexed reference genotypes (10 female and 9 male). One hundred and twenty-five of these sex-associated consensus sequences that displayed complete exclusivity (i.e., never co-occurred in an individual sample) were then used to sex over 100 sterile samples.
The availability of the genomes will allow us to generate more extensive genetic markers that span the genome and allow us to link the genotypic variation to functional aspects of the genome. Genetic sex determination of the samples will allow us to explore the placement of males and females across microenvironments and estimate the sex ratios at each site. For each population, an unbiased estimate of gene diversity (H s , Nei & Roychoudhury 1974) and proportion of polymorphic loci will be calculated for all SNPs. Male and female samples will be partitioned and the same calculations will be made for comparison. Because increased variance in reproductive success is predicted to reduce effective population size (Bachtrog et al. 2011), Syntrichia males might contain less overall genetic diversity than females. Individual SNP genotypes (15~20) for each population will be compiled and used to estimate the global multilocus F ST among populations and between pairs of populations. Significance of differentiation between populations will be assessed via AMOVA.
Evaluating spatial genetic structure in Syntrichia
Spatial distances between individuals within a site and between sites will be converted into a distance matrix to provide a context for tests of spatial genetic structure. Environmental variables recorded for each site (e.g., mean annual precipitation, average temperature in wettest / driest months, etc.) will be summarized through a principal component analysis (PCA) of the covariance matrix. The axes explaining the majority of the variance in habitats will be used in further analyses as compound indicators of site environment. Within populations, pairwise kinship coefficients (F ij ) will be calculated for individual collections and the significance of the slope of the regression of F ij on the spatial distance between individuals will be tested by means of permutation tests (Mantel). The linearity of the relationship between geographic distance and a measure of genetic distance (F ST ) between populations will also be assessed via Mantel tests. Similarly, to test for a correlation between genetic and environmental differentiation, we will perform partial Mantel tests (controlling for the geographic distance) between pairwise F ST and pairwise differences in population scores along the PCA axes (e.g., Hutsemekers et al. 2010).
Overall, these tests will allow us to evaluate the degree of spatial genetic structuring within and between populations, and whether geographic distance or environment has more of an influence on genetic variation. The spatial genetic structure characterized through the Mantel tests and overall predictability of population assignments based on geographic or environmental partitions will be further explored and corroborated in a variational Bayesian probabilistic framework using fastSTRUCTURE (Raj et al. 2014).
Finally, to augment our understanding of spatial genetic structure, the extent and direction of migration between Syntrichia populations will be estimated from the SNP genotype data in a maximum likelihood framework using Migrate-n (Beerli & Felsenstein 2001; Beerli 2006).