The history of maize has been characterized by major demographic events, including population size changes associated with domestication and range expansion, and gene flow with wild relatives. The ...interplay between demographic history and selection has shaped diversity across maize populations and genomes.
We investigate these processes using high-depth resequencing data from 31 maize landraces spanning the pre-Columbian distribution of maize, and four wild teosinte individuals (Zea mays ssp. parviglumis). Genome-wide demographic analyses reveal that maize experienced pronounced declines in effective population size due to both a protracted domestication bottleneck and serial founder effects during post-domestication spread, while parviglumis in the Balsas River Valley experienced population growth. The domestication bottleneck and subsequent spread led to an increase in deleterious alleles in the domesticate compared to the wild progenitor. This cost is particularly pronounced in Andean maize, which has experienced a more dramatic founder event compared to other maize populations. Additionally, we detect introgression from the wild teosinte Zea mays ssp. mexicana into maize in the highlands of Mexico, Guatemala, and the southwestern USA, which reduces the prevalence of deleterious alleles likely due to the higher long-term effective population size of teosinte.
These findings underscore the strong interaction between historical demography and the efficiency of selection and illustrate how domesticated species are particularly useful for understanding these processes. The landscape of deleterious alleles and therefore evolutionary potential is clearly influenced by recent demography, a factor that could bear importantly on many species that have experienced recent demographic shifts.
Association mapping (AM) is a powerful tool for fine mapping complex trait variation down to nucleotide sequences by exploiting historical recombination events. A major problem in AM is controlling ...false positives that can arise from population structure and family relatedness. False positives are often controlled by incorporating covariates for structure and kinship in mixed linear models (MLM). These MLM-based methods are single locus models and can introduce false negatives due to over fitting of the model. In this study, eight different statistical models, ranging from single-locus to multilocus, were compared for AM for three traits differing in heritability in two crop species: soybean (
L.) and maize (
L.). Soybean and maize were chosen, in part, due to their highly differentiated rate of linkage disequilibrium (LD) decay, which can influence false positive and false negative rates. The fixed and random model circulating probability unification (FarmCPU) performed better than other models based on an analysis of Q-Q plots and on the identification of the known number of quantitative trait loci (QTLs) in a simulated data set. These results indicate that the FarmCPU controls both false positives and false negatives. Six qualitative traits in soybean with known published genomic positions were also used to compare these models, and results indicated that the FarmCPU consistently identified a single highly significant SNP closest to these known published genes. Multiple comparison adjustments (Bonferroni, false discovery rate, and positive false discovery rate) were compared for these models using a simulated trait having 60% heritability and 20 QTLs. Multiple comparison adjustments were overly conservative for MLM, CMLM, ECMLM, and MLMM and did not find any significant markers; in contrast, ANOVA, GLM, and SUPER models found an excessive number of markers, far more than 20 QTLs. The FarmCPU model, using less conservative methods (false discovery rate, and positive false discovery rate) identified 10 QTLs, which was closer to the simulated number of QTLs than the number found by other models.
Genotyping-by-sequencing (GBS) approaches provide low-cost, high-density genotype information. However, GBS has unique technical considerations, including a substantial amount of missing data and a ...nonuniform distribution of sequence reads. The goal of this study was to characterize technical variation using this method and to develop methods to optimize read depth to obtain desired marker coverage. To empirically assess the distribution of fragments produced using GBS, ∼8.69 Gb of GBS data were generated on the Zea mays reference inbred B73, utilizing ApeKI for genome reduction and single-end reads between 75 and 81 bp in length. We observed wide variation in sequence coverage across sites. Approximately 76% of potentially observable cut site-adjacent sequence fragments had no sequencing reads whereas a portion had substantially greater read depth than expected, up to 2369 times the expected mean. The methods described in this article facilitate determination of sequencing depth in the context of empirically defined read depth to achieve desired marker density for genetic mapping studies.
Premature senescence in annual crops reduces yield, while delayed senescence, termed stay-green, imposes positive and negative impacts on yield and nutrition quality. Despite its importance, scant ...information is available on the genetic architecture of senescence in maize (
) and other cereals. We combined a systematic characterization of natural diversity for senescence in maize and coexpression networks derived from transcriptome analysis of normally senescing and stay-green lines. Sixty-four candidate genes were identified by genome-wide association study (GWAS), and 14 of these genes are supported by additional evidence for involvement in senescence-related processes including proteolysis, sugar transport and signaling, and sink activity. Eight of the GWAS candidates, independently supported by a coexpression network underlying stay-green, include a trehalose-6-phosphate synthase, a NAC transcription factor, and two xylan biosynthetic enzymes. Source-sink communication and the activity of cell walls as a secondary sink emerge as key determinants of stay-green. Mutant analysis supports the role of a candidate encoding Cys protease in stay-green in Arabidopsis (
), and analysis of natural alleles suggests a similar role in maize. This study provides a foundation for enhanced understanding and manipulation of senescence for increasing carbon yield, nutritional quality, and stress tolerance of maize and other cereals.
Genome wide association studies (GWAS) are a powerful tool for identifying quantitative trait loci (QTL) and causal single nucleotide polymorphisms (SNPs)/genes associated with various important ...traits in crop species. Typically, GWAS in crops are performed using a panel of inbred lines, where multiple replicates of the same inbred are measured and the average phenotype is taken as the response variable. Here we describe and evaluate single plant GWAS (sp-GWAS) for performing a GWAS on individual plants, which does not require an association panel of inbreds. Instead sp-GWAS relies on the phenotypes and genotypes from individual plants sampled from a randomly mating population. Importantly, we demonstrate how sp-GWAS can be efficiently combined with a bulk segregant analysis (BSA) experiment to rapidly corroborate evidence for significant SNPs.
In this study we used the Shoepeg maize landrace, collected as an open pollinating variety from a farm in Southern Missouri in the 1960's, to evaluate whether sp-GWAS coupled with BSA can efficiently and powerfully used to detect significant association of SNPs for plant height (PH). Plant were grown in 8 locations across two years and in total 768 individuals were genotyped and phenotyped for sp-GWAS. A total of 306 k polymorphic markers in 768 individuals evaluated via association analysis detected 25 significant SNPs (P ≤ 0.00001) for PH. The results from our single-plant GWAS were further validated by bulk segregant analysis (BSA) for PH. BSA sequencing was performed on the same population by selecting tall and short plants as separate bulks. This approach identified 37 genomic regions for plant height. Of the 25 significant SNPs from GWAS, the three most significant SNPs co-localize with regions identified by BSA.
Overall, this study demonstrates that sp-GWAS coupled with BSA can be a useful tool for detecting significant SNPs and identifying candidate genes. This result is particularly useful for species/populations where association panels are not readily available.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
High-density marker panels and/or whole-genome sequencing, coupled with advanced phenotyping pipelines and sophisticated statistical methods, have dramatically increased our ability to generate lists ...of candidate genes or regions that are putatively associated with phenotypes or processes of interest. However, the speed with which we can validate genes, or even make reasonable biological interpretations about the principles underlying them, has not kept pace. A promising approach that runs parallel to explicitly validating individual genes is analyzing a set of genes together and assessing the biological similarities among them. This is often achieved via gene ontology analysis, a powerful tool that involves evaluating publicly available gene annotations. However, additional resources such as Medical Subject Headings (MeSH) can also be used to evaluate sets of genes to make biological interpretations.
In this manuscript, we describe utilizing MeSH terms to make biological interpretations in maize. MeSH terms are assigned to PubMed-indexed manuscripts by the National Library of Medicine, and can be directly mapped to genes to develop gene annotations. Once mapped, these terms can be evaluated for enrichment in sets of genes or similarity between gene sets to provide biological insights. Here, we implement MeSH analyses in five maize datasets to demonstrate how MeSH can be leveraged by the maize and broader crop-genomics community.
We demonstrate that MeSH terms can be effectively leveraged to generate hypotheses and make biological interpretations in maize, and we provide a pipeline that enables the use of MeSH terms in other plant species.
Abstract
We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial breeding data with machine learning-based models. ...learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or to retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated over specific periods of time based on naive (for instance, nonoverlapping 10-day windows) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient-boosted decision trees, random forests, stacked ensemble models, and multilayer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with multi-environment trial experimental data in a user-friendly way. The package is published under an MIT license and accessible on GitHub.
The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the ...field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (
elastic net
) and to two nonlinear gradient boosting methods based on decision tree algorithms (
XGBoost, LightGBM
). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.
Genomic selection is a powerful tool in plant breeding. By building a prediction model using a training set with markers and phenotypes, genomic estimated breeding values (GEBVs) can be used as ...predictions of breeding values in a target set with only genotype data. There is, however, limited information on how prediction accuracy of genomic prediction can be optimized. The objective of this study was to evaluate the performance of 11 genomic prediction models across species in terms of prediction accuracy for two traits with different heritabilities using several subsets of markers and training population proportions. Species studied were maize (Zea mays, L.), soybean (Glycine max, L.), and rice (Oryza sativa, L.), which vary in linkage disequilibrium (LD) decay rates and have contrasting genetic architectures.
Correlations between observed and predicted GEBVs were determined via cross validation for three training-to-testing proportions (90:10, 70:30, and 50:50). Maize, which has the shortest extent of LD, showed the highest prediction accuracy. Amongst all the models tested, Bayes B performed better than or equal to all other models for each trait in all the three crops. Traits with higher broad-sense and narrow-sense heritabilities were associated with higher prediction accuracy. When subsets of markers were selected based on LD, the accuracy was similar to that observed from the complete set of markers. However, prediction accuracies were significantly improved when using a subset of total markers that were significant at P ≤ 0.05 or P ≤ 0.10. As expected, exclusion of QTL-associated markers in the model reduced prediction accuracy. Prediction accuracy varied among different training population proportions.
We conclude that prediction accuracy for genomic selection can be improved by using the Bayes B model with a subset of significant markers and by selecting the training population based on narrow sense heritability.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Evolutionary insights into plant breeding Turner-Hissong, Sarah D; Mabry, Makenzie E; Beissinger, Timothy M ...
Current opinion in plant biology,
04/2020, Letnik:
54
Journal Article
Recenzirano
Odprti dostop
Display omitted
•Population genetics can inform breeding strategies to increase genetic gain.•Genomic regions with selective sweeps are promising targets to introgress diversity.•Demography ...determines genetic architecture and deleterious load.•Deleterious alleles can either be purged via selection or masked via hybridization.•Gene editing enables rapid, precise introgression and de novo domestication.
Crop domestication is a fascinating area of study, as shown by a multitude of recent reviews. Coupled with the increasing availability of genomic and phenomic resources in numerous crop species, insights from evolutionary biology will enable a deeper understanding of the genetic architecture and short-term evolution of complex traits, which can be used to inform selection strategies. Future advances in crop improvement will rely on the integration of population genetics with plant breeding methodology, and the development of community resources to support research in a variety of crop life histories and reproductive strategies. We highlight recent advances related to the role of selective sweeps and demographic history in shaping genetic architecture, how these breakthroughs can inform selection strategies, and the application of precision gene editing to leverage these connections.