Key message
We propose the application of enviromics to breeding practice, by which the similarity among sites assessed on an “omics” scale of environmental attributes drives the prediction of ...unobserved genotype performances.
Genotype by environment interaction (GEI) studies in plant breeding have focused mainly on estimating genetic parameters over a limited number of experimental trials. However, recent geographic information system (GIS) techniques have opened new frontiers for better understanding and dealing with GEI. These advances allow increasing selection accuracy across all sites of interest, including those where experimental trials have not yet been deployed. Here, we introduce the term enviromics, within an envirotypic-assisted breeding framework. In summary, likewise genotypes at DNA markers, any particular site is characterized by a set of “envirotypes” at multiple “enviromic” markers corresponding to environmental variables that may interact with the genetic background, thus providing informative breeding re-rankings for optimized decisions over different environments. Based on simulated data, we illustrate an index-based enviromics method (the “GIS–GEI”) which, due to its higher granular resolution than standard methods, allows for: (1) accurate matching of sites to their most appropriate genotypes; (2) better definition of breeding areas that have high genetic correlation to ensure selection gains across environments; and (3) efficient determination of the best sites to carry out experiments for further analyses. Environmental scenarios can also be optimized for productivity improvement and genetic resources management, especially in the current outlook of dynamic climate change. Envirotyping provides a new class of markers for genetic studies, which are fairly inexpensive, increasingly available and transferable across species. We envision a promising future for the integration of enviromics approaches into plant breeding when coupled with next-generation genotyping/phenotyping and powerful statistical modeling of genetic diversity.
Precision animal agriculture is poised to rise to prominence in the livestock enterprise in the domains of management, production, welfare, sustainability, health surveillance, and environmental ...footprint. Considerable progress has been made in the use of tools to routinely monitor and collect information from animals and farms in a less laborious manner than before. These efforts have enabled the animal sciences to embark on information technology-driven discoveries to improve animal agriculture. However, the growing amount and complexity of data generated by fully automated, high-throughput data recording or phenotyping platforms, including digital images, sensor and sound data, unmanned systems, and information obtained from real-time noninvasive computer vision, pose challenges to the successful implementation of precision animal agriculture. The emerging fields of machine learning and data mining are expected to be instrumental in helping meet the daunting challenges facing global agriculture. Yet, their impact and potential in "big data" analysis have not been adequately appreciated in the animal science community, where this recognition has remained only fragmentary. To address such knowledge gaps, this article outlines a framework for machine learning and data mining and offers a glimpse into how they can be applied to solve pressing problems in animal sciences.
Receptor-like kinases (RLKs) play key roles during development and in responses to the environment. Despite the relevance of the RLK family and the completion of the tomato genome sequencing, the ...tomato RLK family has not yet been characterized, and a framework for functional predictions of the members of the family is lacking.
To generate a complete list of all the members of the tomato RLK family, we performed a phylogenetic analysis using the Arabidopsis family as a template. A total of 647 RLKs were identified in the tomato genome, which were organized into the same subfamily clades as Arabidopsis RLKs. Only eight of 58 RLK subfamilies exhibited specific expansion/reduction compared to their Arabidopsis counterparts. We also characterized the LRRII-RLK family by phylogeny, genomic analysis, expression profile and interaction with the virulence factor from begomoviruses, the nuclear shuttle protein (NSP). The LRRII subfamily members from tomato and Arabidopsis were highly conserved in both sequence and structure. Nevertheless, the majority of the orthologous pairs did not display similar conservation in the gene expression profile, indicating that these orthologs may have diverged in function after speciation. Based on the fact that members of the Arabidopsis LRRII subfamily (AtNIK1, AtNIK2 and AtNIK3) interact with the begomovirus nuclear shuttle protein (NSP), we examined whether the tomato orthologs of NIK, BAK1 and NsAK genes interact with NSP of Tomato Yellow Spot Virus (ToYSV). The tomato orthologs of NSP interactors, SlNIKs and SlNsAK, interacted specifically with NSP in yeast and displayed an expression pattern consistent with the pattern of geminivirus infection. In addition to suggesting a functional analogy between these phylogenetically classified orthologs, these results expand our previous observation that NSP-NIK interactions are neither virus-specific nor host-specific.
The tomato RLK superfamily is made-up of 647 proteins that form a monophyletic tree with the Arabidopsis RLKs and is divided into 58 subfamilies. Few subfamilies have undergone expansion/reduction, and only six proteins were lineage-specific. Therefore, the tomato RLK family shares functional and structural conservation with Arabidopsis. For the LRRII-RLK members SlNIK1 and SlNIK3, we observed functions analogous to those of their Arabidopsis counterparts with respect to protein-protein interactions and similar expression profiles, which predominated in tissues that support high efficiency of begomovirus infection. Therefore, NIK-mediated antiviral signaling is also likely to operate in tomato, suggesting that tomato NIKs may be good targets for engineering resistance against tomato-infecting begomoviruses.
•Source databases for plant molecular data.•Main machine learning concepts and tools.•Machine learning in plant molecular biology.
Machine learning (ML) is a field of artificial intelligence that has ...rapidly emerged in molecular biology, thus allowing the exploitation of Big Data concepts in plant genomics. In this context, the main challenges are given in terms of how to analyze massive datasets and extract new knowledge in all levels of cellular systems research. In summary, ML techniques allow complex interactions to be inferred in several biological systems. Despite its potential, ML has been underused due to complex computational algorithms and definition terms. Therefore, a systematic review to disentangle ML approaches is relevant for plant scientists and has been considered in this study. We presented the main steps for ML development (from data selection to evaluation of classification/prediction models) with a respective discussion approaching functional genomics mainly in terms of pathogen effector genes in plant immunity. Additionally, we also considered how to access public source databases under an ML framework towards advancing plant molecular biology and introduced novel powerful tools, such as deep learning.
Studies have shown that intramuscular adipogenesis and fibrogenesis may concomitantly occur in skeletal muscle of beef cattle. Thus, we hypothesized that the discrepancy of intramuscular fat content ...in beef from Nellore and Angus was associated with differences in intramuscular adipogenesis and fibrogenesis during the finishing phase. To test our hypothesis, longissimus muscle samples of Nellore (n = 6; BW = 372.5 ± 37.3 kg) and Angus (n = 6; BW = 382.8 ± 23.9 kg) cattle were collected for analysis of gene and protein expression, and quantification of intramuscular fat and collagen. Least-squares means were estimated for the effect of Breed and differences were considered at P ≤ 0.05. A greater intramuscular fat content was observed in skeletal muscle of Angus compared to Nellore cattle (P≤0.05). No differences were observed for mRNA expression of lipogenic and lipolytic markers ACC, FAS, FABP4, SERBP-1, CPT-2, LPL, and ACOX (P > 0.05) in skeletal muscle of Nellore and Angus cattle. Similarly, no differences were observed in mRNA expression of adipogenic markers Zfp423, PPARγ, and C/EBPα (P>0.05) However, a greater PPARγ protein content was observed in skeletal muscle of Angus compared to Nellore cattle (P≤0.05). A greater abundance of adipo/fibrogenic cells, evaluated by the PDGFRα content, was observed in skeletal muscle of Angus than Nellore cattle (P≤0.05). No differences in fibrogenesis were observed in skeletal muscle of Angus and Nellore cattle, which is in accordance with the lack of differences in intramuscular collagen content in beef from both breeds (P>0.05). These findings demonstrate that difference in intramuscular fat content is associated with a slightly enhanced adipogenesis in skeletal muscle of Angus compared to Nellore cattle, while no difference in fibrogenesis.
Abstract In plant breeding, the dialelic models univariate have aided the selection of parents for hybridization. Multivariate analyses allow combining and associating the multiple pieces of ...information of the genetic relationships between traits. Therefore, multivariate analyses might refine the discrimination and selection of the parents with greater potential to meet the goals of a plant breeding program. Here, we propose a method of multivariate analysis used for stablishing mega-traits (MTs) in diallel trials. The proposed model is applied in the evaluation of a multi-environment complete diallel trial with 90 F1’s of simple maize hybrids. From a set of 14 traits, we demonstrated how establishing and interpreting MTs with agronomic implication. The diallel analyzes based on mega-traits present an important evolution in statistical procedures since the selection is based on several traits. We believe that the proposed method fills an important gap of plant breeding. In our example, three MTs were established. The first, formed by plant stature-related traits, the second by tassel size-related traits, and the third by grain yield-related traits. Individual and joint diallel analysis using the established MTs allowed identifying the best hybrid combinations for achieving F1’s with lower plant stature, tassel size, and higher grain yield.
Cell surface receptors play essential roles in perceiving and processing external and internal signals at the cell surface of plants and animals. The receptor-like protein kinases (RLK) and ...receptor-like proteins (RLPs), two major classes of proteins with membrane receptor configuration, play a crucial role in plant development and disease defense. Although RLPs and RLKs share a similar single-pass transmembrane configuration, RLPs harbor short divergent C-terminal regions instead of the conserved kinase domain of RLKs. This RLP receptor structural design precludes sequence comparison algorithms from being used for high-throughput predictions of the RLP family in plant genomes, as has been extensively performed for RLK superfamily predictions. Here, we developed the RLPredictiOme, implemented with machine learning models in combination with Bayesian inference, capable of predicting RLP subfamilies in plant genomes. The ML models were simultaneously trained using six types of features, along with three stages to distinguish RLPs from non-RLPs (NRLPs), RLPs from RLKs, and classify new subfamilies of RLPs in plants. The ML models achieved high accuracy, precision, sensitivity, and specificity for predicting RLPs with relatively high probability ranging from 0.79 to 0.99. The prediction of the method was assessed with three datasets, two of which contained leucine-rich repeats (LRR)-RLPs from Arabidopsis and rice, and the last one consisted of the complete set of previously described Arabidopsis RLPs. In these validation tests, more than 90% of known RLPs were correctly predicted via RLPredictiOme. In addition to predicting previously characterized RLPs, RLPredictiOme uncovered new RLP subfamilies in the Arabidopsis genome. These include probable lipid transfer (PLT)-RLP, plastocyanin-like-RLP, ring finger-RLP, glycosyl-hydrolase-RLP, and glycerophosphoryldiester phosphodiesterase (GDPD, GDPDL)-RLP subfamilies, yet to be characterized. Compared to the only Arabidopsis GDPDL-RLK, molecular evolution studies confirmed that the ectodomain of GDPDL-RLPs might have undergone a purifying selection with a predominance of synonymous substitutions. Expression analyses revealed that predicted GDPGL-RLPs display a basal expression level and respond to developmental and biotic signals. The results of these biological assays indicate that these subfamily members have maintained functional domains during evolution and may play relevant roles in development and plant defense. Therefore, RLPredictiOme provides a framework for genome-wide surveys of the RLP superfamily as a foundation to rationalize functional studies of surface receptors and their relationships with different biological processes.
Plants and plant pathogens are subject to continuous co-evolutionary pressure for dominance, and the outcomes of these interactions can substantially impact agriculture and food security. In ...virus-plant interactions, one of the major mechanisms for plant antiviral immunity relies on RNA silencing, which is often suppressed by co-evolving virus suppressors, thus enhancing viral pathogenicity in susceptible hosts. In addition, plants use the nucleotide-binding and leucine-rich repeat (NB-LRR) domain-containing resistance proteins, which recognize viral effectors to activate effector-triggered immunity in a defence mechanism similar to that employed in non-viral infections. Unlike most eukaryotic organisms, plants are not known to activate mechanisms of host global translation suppression to fight viruses. Here we demonstrate in Arabidopsis that the constitutive activation of NIK1, a leucine-rich repeat receptor-like kinase (LRR-RLK) identified as a virulence target of the begomovirus nuclear shuttle protein (NSP), leads to global translation suppression and translocation of the downstream component RPL10 to the nucleus, where it interacts with a newly identified MYB-like protein, L10-INTERACTING MYB DOMAIN-CONTAINING PROTEIN (LIMYB), to downregulate translational machinery genes fully. LIMYB overexpression represses ribosomal protein genes at the transcriptional level, resulting in protein synthesis inhibition, decreased viral messenger RNA association with polysome fractions and enhanced tolerance to begomovirus. By contrast, the loss of LIMYB function releases the repression of translation-related genes and increases susceptibility to virus infection. Therefore, LIMYB links immune receptor LRR-RLK activation to global translation suppression as an antiviral immunity strategy in plants.
Feed efficiency is one of the most important parameters that affect beef production costs. The energy metabolism of skeletal muscle greatly contributes to variations in feed efficiency. However, ...information regarding differences in proteins involved in the energy metabolism of the skeletal muscle in beef cattle divergently identified for feed efficiency is scarce. In this study, we aimed to investigate energy metabolism of skeletal muscle of Nellore beef cattle, identified for low and high residual feed intake using a proteomics approach. We further assessed the expression of candidate microRNAs as a one of the possible mechanisms controlling the biosynthesis of the proteins involved in energy metabolism that were differentially abundant between high and low residual feed intake animals.
A greater abundance of 14-3-3 protein epsilon (P = 0.01) was observed in skeletal muscle of residual feed intake (RFI) high animals (RFI-High). Conversely, a greater abundance of Heat Shock Protein Beta 1 (P < 0.01) was observed in the skeletal muscle of RFI-Low cattle. A greater mRNA expression of YWHAE, which encodes the 14-3-3 protein epsilon, was also observed in the skeletal muscle of RFI-High animals (P = 0.01). A lower mRNA expression of HSPB1, which encodes the Heat Shock Protein Beta 1, was observed in the skeletal muscle of RFI-High animals (P = 0.01). The miR-665 was identified as a potential regulator of the 14-3-3 protein epsilon, and its expression was greater in RFI-Low animals (P < .001). A greater expression of miR-34a (P = 0.01) and miR-2899 (P < .001) was observed in the skeletal muscle of RFI-High animals, as both miRNAs were identified as potential regulators of HSPB1 expression.
Our results show that Nellore cattle divergently identified for feed efficiency by RFI present changes in the abundance of proteins involved in energy expenditure in skeletal muscle. Moreover, our data point towards that miR-665, miR34a and miR-2899 are likely involved in controlling both 14-3-3 epsilon and HSPB1 proteins identified as differentially abundant in the skeletal muscle of RFI-High and RFI-Low Nellore cattle.
In recent years, there has been increased interest in the study of the molecular processes that affect semen traits. In this study, our aim was to identify quantitative trait loci (QTL) regions ...associated with four semen traits (motility, progressive motility, number of sperm cells per ejaculate and total morphological defects) in two commercial pig lines (L1: Large White type and L2: Landrace type). Since the number of animals with both phenotypes and genotypes was relatively small in our dataset, we conducted a weighted single-step genome-wide association study, which also allows unequal variances for single nucleotide polymorphisms. In addition, our aim was also to identify candidate genes within QTL regions that explained the highest proportions of genetic variance. Subsequently, we performed gene network analyses to investigate the biological processes shared by genes that were identified for the same semen traits across lines.
We identified QTL regions that explained up to 10.8% of the genetic variance of the semen traits on 12 chromosomes in L1 and 11 chromosomes in L2. Sixteen QTL regions in L1 and six QTL regions in L2 were associated with two or more traits within the population. Candidate genes SCN8A, PTGS2, PLA2G4A, DNAI2, IQCG and LOC102167830 were identified in L1 and NME5, AZIN2, SPATA7, METTL3 and HPGDS in L2. No regions overlapped between these two lines. However, the gene network analysis for progressive motility revealed two genes in L1 (PLA2G4A and PTGS2) and one gene in L2 (HPGDS) that were involved in two biological processes i.e. eicosanoid biosynthesis and arachidonic acid metabolism. PTGS2 and HPGDS were also involved in the cyclooxygenase pathway.
We identified several QTL regions associated with semen traits in two pig lines, which confirms the assumption of a complex genetic determinism for these traits. A large part of the genetic variance of the semen traits under study was explained by different genes in the two evaluated lines. Nevertheless, the gene network analysis revealed candidate genes that are involved in shared biological pathways that occur in mammalian testes, in both lines.