Genomic Data Overview
What are the genetic mechanisms underlying traits (including phenotypic plasticity) that drive diversification, reproduction, habitat selection, and physiological processes in Syntrichia?
To address our phylogenetic, population genetic, physiological, community, and ecosystem questions (Table 1), we will generate the necessary genomic resources for our target populations using next generation sequencing (NGS) methods. The genomic resources we propose to develop will enable us to generate mechanistic hypotheses to explain drivers of speciation, population genetic structure, DT-reproductive tradeoffs, habitat selection, and physiological plasticity in North American Syntrichia.
The foundation of the genomic resources will be high-quality full-sequence draft genomes for both Syntrichia caninervis and S. ruralis. These genomes are approximately 390 Mb in size, as determined by propidium iodide flow cytometry (PI), and represent a significantly smaller genome than that of Physcomitrella patens, which is approximately 480 Mb (Volgmayr 2000). The quality reference genomes will serve several critical purposes, allowing functional linkages from the transcriptomes to genetic elements via the establishment of genetic markers that we will utilize in all aspects of the proposed research. The reference genomes will provide the backbone for the transcriptomic analyses that will deepen our ability to assess the environmental and developmental effects on gene expression in our targeted experimental populations. The genome will also provide high-quality genotyping tools for ecotypic characterization, population dynamics, and phylogenetic analyses that allow us to probe questions related to speciation, phylogenetics, population genetics, and physiological response strategies for changing environments studied in our community/ecosystems level component.
We will utilize pure and axenic cultured S. caninervis and S. ruralis shoots (gametophores) developed in the Fisher and Stark laboratories as our source material for genomic DNA isolations. We will preserve vouchered stock cultures in cold storage (8°C and 2 h light) to maintain genetic integrity for future growth, sequence confirmation studies, and distribution as a community resource. All cultures in this portion of the project will be duplicated, stored, and maintained in the Oliver laboratory. High quality genomic DNA with fragment sizes above 100 Kb can be isolated from the haploid gametophytes of each of these species using standard techniques (O’Mahony & Oliver 1999) or using more recent technologies, E.Z.N.A. HP Plant DNA Kit (Omega Biotek cat. No. D2087) modified by the addition of an equilibrated phenol extraction that has recently proven effective in genome sequencing efforts currently ongoing in the Oliver lab (unpublished).
We will utilize the services of Dovetail Genomics (http://dovetailgenomics.com) to complete a high-quality genome assembly via both Illumina HiSeq sequencing technologies and a proprietary library construction strategy, Chicago TM libraries, to deliver a high quality genome sequence. Dovetail genomics will utilize their HiRise pipeline to deliver de novo assembled genomes for both S. caninervis and S. ruralis. Once we have received the assemblies they will be annotated as described below.
The assembled genomes will be structurally annotated using the MAKER (http://www.gmod.org/wiki/MAKER) pipeline. Assembled transcripts (described above) will provide experimental evidence for structural annotations, and Augustus (http://augustus.gobics.de), SNAP (http://www.broadinstitute.org/mpg/snap/), and GeneMark (http://opal.biology.gatech.edu) for initial predictions. Functional annotations will be assigned to structural annotations using Blast2Go (http://www.blast2go.com/) for GO assignments, and InterproScan for gene family assignment and to identify functional domains.
Comparative genomics. Syntentic (located on the same chromosome) gene sets will be identified using the SynMap and SynFind tools of CoGe (http://genomevolution.org/CoGe/). The master gene sets will be used to identify and classify position stable genes, gene copy number variation as a function of tandem gene duplications, and unique genes. For genes that are syntenically conserved across species and tandemly duplicated, Tajima’s D will be calculated in order to identify genomic regions under stabilizing, neutral, and directional selection.
Comparative functional genomics
Genes differentially regulated under dehydration and temperature treatments will be identified for each of the two focal species. We will perform gene ontology enrichment analysis to determine which general functions are conserved among Syntrichia species. In addition, differentially expressed genes from each species will be mapped to their syntenic gene sets in order to determine which genes are performing similar functions. Of particular interest will be syntenically conserved genes that are not behaving similarly in different species. Combining these gene functions, syntenic conservation, and evolutionary selection will prove insightful for understanding the evolution of these individual species.
Novel methods for advanced data visualization
In order to make insights into the unique functional evolution of these species, new visualization techniques will be required that combine syntenic, diversity, and functional data (e.g., DT across multiple species of Syntrichia). Specifically, a multidimensional visualization will integrate syntenic regions between genomes highlighting unique patterns of gene expression. With the assistance of Dr. Eric Lyons of CyVerse (letter attached), we will integrate these diagrams into CoGe’s genome visualization system for use by the scientific community.