Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run ...in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results.
http://www.maizegenetics.net/GAPIT.
zhiwu.zhang@cornell.edu
Supplementary data are available at Bioinformatics online.
Northern leaf blight (NLB) can cause severe yield loss in maize; however, scouting large areas to accurately diagnose the disease is time consuming and difficult. We demonstrate a system capable of ...automatically identifying NLB lesions in field-acquired images of maize plants with high reliability. This approach uses a computational pipeline of convolutional neural networks (CNNs) that addresses the challenges of limited data and the myriad irregularities that appear in images of field-grown plants. Several CNNs were trained to classify small regions of images as containing NLB lesions or not; their predictions were combined into separate heat maps, then fed into a final CNN trained to classify the entire image as containing diseased plants or not. The system achieved 96.7% accuracy on test set images not used in training. We suggest that such systems mounted on aerial- or ground-based vehicles can help in automated high-throughput plant phenotyping, precision breeding for disease resistance, and reduced pesticide use through targeted application across a variety of plant and disease categories.
genetic architecture of maize height Peiffer, Jason A; Romay, Maria C; Gore, Michael A ...
Genetics (Austin),
04/2014, Letnik:
196, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Height is one of the most heritable and easily measured traits in maize (Zea mays L.). Given a pedigree or estimates of the genomic identity-by-state (IBS) among related plants, height is also ...accurately predictable. But, mapping alleles explaining natural variation in maize height remains a formidable challenge. To address this challenge, we measured the plant height, ear height, flowering time, and node counts of plants grown in >64,500 plots across 13 environments. These plots contained >7,300 inbreds representing most publically available maize inbreds in the U.S.A. as well as families of the maize Nested Association Mapping (NAM) panel. Joint-linkage mapping of quantitative trait loci (QTL), fine mapping in near isogenic lines (NILs), genome wide association studies (GWAS), and genomic best linear unbiased prediction (GBLUP) were performed. The heritability of plant height was estimated to be over 90%. Mapping of NAM family-nested QTL revealed the largest explained about 2.1 ± 0.9% of height variation. The effects of two tropical alleles at this QTL were independently validated by fine mapping. Several significant associations found by GWAS co-localized with established height loci including brassinosteroid-deficient dwarf1, dwarf plant1, and semi-dwarf2. GBLUP explained >80% of plant height variation in the observed panels and outperformed bootstrap aggregation of family-nested QTL models in evaluations of prediction accuracy. These results revealed maize height was under strong genetic control and had a highly polygenic genetic architecture. They also showed that multiple models of genetic architecture differing in polygenicity and effect sizes can plausibly explain a population’s variation in maize height, but they may vary in predictive efficacy.
First-Generation Haplotype Map of Maize Gore, Michael A; Chia, Jer-Ming; Elshire, Robert J ...
Science (American Association for the Advancement of Science),
11/2009, Letnik:
326, Številka:
5956
Journal Article
Recenzirano
Maize is an important crop species of high genetic diversity. We identified and genotyped several million sequence polymorphisms among 27 diverse maize inbred lines and discovered that the genome was ...characterized by highly divergent haplotypes and showed 10- to 30-fold variation in recombination rates. Most chromosomes have pericentromeric regions with highly suppressed recombination that appear to have influenced the effectiveness of selection during maize inbred development and may be a major component of heterosis. We found hundreds of selective sweeps and highly differentiated regions that probably contain loci that are key to geographic adaptation. This survey of genetic diversity provides a foundation for uniting breeding efforts across the world and for dissecting complex traits through genome-wide association studies.
Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally ...challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.
► Cotton plants were subjected to drought and heat stress under field conditions. ► Both drought and heat stresses were associated with water availability. ► Diffusive (drought-induced) and ...biochemical (heat-induced) limitations compromised photosynthetic performance. ► Rubisco inactivation was associated with the inhibition of photosynthesis caused by heat stress.
Heat and drought stresses are often coincident and constitute major factors limiting global crop yields. A better understanding of plant responses to the combination of these stresses under production environments will facilitate efforts to improve yield and water use efficiencies in a climatically changing world. To evaluate photosynthetic performance under dry-hot conditions, four cotton (Gossypium barbadense L.) cultivars, Monseratt Sea Island (MS), Pima 32 (P32), Pima S-6 (S6) and Pima S-7 (S7), were studied under well-watered (WW) and water-limited (WL) conditions at a field site in central Arizona. Differences in canopy temperature and leaf relative water content under WL conditions indicated that, of the four cultivars, MS was the most drought-sensitive and S6 the most drought-tolerant. Net CO2 assimilation rates (A) and stomatal conductances (gs) decreased and leaf temperatures increased in WL compared to WW plants of all cultivars, but MS exhibited the greatest changes. The response of A to the intercellular CO2 concentration (A–Ci) showed that, along with stomatal closure, non-stomatal factors associated with heat stress also limited A under WL conditions, especially in MS. The activation state of ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) decreased in WL compared to WW plants, consistent with thermal inhibition of Rubisco activase activity. The extent of Rubisco deactivation could account for the metabolic limitation to photosynthesis in MS. Taken together, these data reveal the complex relationship between water availability and heat stress for field-grown cotton plants in a semi-arid environment. Both diffusive (drought-stress-induced) and biochemical (heat-stress-induced) limitations contributed to decreased photosynthetic performance under dry-hot conditions.
Hyperspectral reflectance phenotyping and genomic selection are two emerging technologies that have the potential to increase plant breeding efficiency by improving prediction accuracy for grain ...yield. Hyperspectral cameras quantify canopy reflectance across a wide range of wavelengths that are associated with numerous biophysical and biochemical processes in plants. Genomic selection models utilize genome-wide marker or pedigree information to predict the genetic values of breeding lines. In this study, we propose a multi-kernel GBLUP approach to genomic selection that uses genomic marker-, pedigree-, and hyperspectral reflectance-derived relationship matrices to model the genetic main effects and genotype × environment (
×
) interactions across environments within a bread wheat (
L.) breeding program. We utilized an airplane equipped with a hyperspectral camera to phenotype five differentially managed treatments of the yield trials conducted by the Bread Wheat Improvement Program of the International Maize and Wheat Improvement Center (CIMMYT) at Ciudad Obregón, México over four breeding cycles. We observed that single-kernel models using hyperspectral reflectance-derived relationship matrices performed similarly or superior to marker- and pedigree-based genomic selection models when predicting within and across environments. Multi-kernel models combining marker/pedigree information with hyperspectral reflectance phentoypes had the highest prediction accuracies; however, improvements in accuracy over marker- and pedigree-based models were marginal when correcting for days to heading. Our results demonstrate the potential of using hyperspectral imaging to predict grain yield within a multi-environment context and also support further studies on the integration of hyperspectral reflectance phenotyping into breeding programs.
•Controlling for spurious associations in statistical models is essential.•Computationally efficient approaches are critical for large data sets.•Statistical genetic models that predict phenotypes ...help accelerate breeding cycles.•Co-evolution between statistical models and sequencing and phenotyping advances is ongoing.
Quantification of genotype-to-phenotype associations is central to many scientific investigations, yet the ability to obtain consistent results may be thwarted without appropriate statistical analyses. Models for association can consider confounding effects in the materials and complex genetic interactions. Selecting optimal models enables accurate evaluation of associations between marker loci and numerous phenotypes including gene expression. Significant improvements in QTL discovery via association mapping and acceleration of breeding cycles through genomic selection are two successful applications of models using genome-wide markers. Given recent advances in genotyping and phenotyping technologies, further refinement of these approaches is needed to model genetic architecture more accurately and run analyses in a computationally efficient manner, all while accounting for false positives and maximizing statistical power.
Computer vision models that can recognize plant diseases in the field would be valuable tools for disease management and resistance breeding. Generating enough data to train these models is ...difficult, however, since only trained experts can accurately identify symptoms. In this study, we describe and implement a two-step method for generating a large amount of high-quality training data with minimal expert input. First, experts located symptoms of northern leaf blight (NLB) in field images taken by unmanned aerial vehicles (UAVs), annotating them quickly at low resolution. Second, non-experts were asked to draw polygons around the identified diseased areas, producing high-resolution ground truths that were automatically screened based on agreement between multiple workers. We then used these crowdsourced data to train a convolutional neural network (CNN), feeding the output into a conditional random field (CRF) to segment images into lesion and non-lesion regions with accuracy of 0.9979 and F1 score of 0.7153. The CNN trained on crowdsourced data showed greatly improved spatial resolution compared to one trained on expert-generated data, despite using only one fifth as many expert annotations. The final model was able to accurately delineate lesions down to the millimeter level from UAV-collected images, the finest scale of aerial plant disease detection achieved to date. The two-step approach to generating training data is a promising method to streamline deep learning approaches for plant disease detection, and for complex plant phenotyping tasks in general.