The development of next-generation sequencing technologies has provided new opportunities for genotyping various organisms, including plants. Genotyping by sequencing (GBS) is used to identify ...genetic variability more rapidly, and is more cost-effective than whole-genome sequencing. GBS has demonstrated its reliability and flexibility for a number of plant species and populations. It has been applied to genetic mapping, molecular marker discovery, genomic selection, genetic diversity studies, variety identification, conservation biology and evolutionary studies. However, reduction in sequencing time and cost has led to the need to develop efficient bioinformatics analyses for an ever-expanding amount of sequenced data. Bioinformatics pipelines for GBS data analysis serve the purpose. Due to the similarity of data processing steps, existing pipelines are mainly characterised by a combination of software packages specifically selected either to process data for certain organisms or to process data from any organisms. However, despite the usage of efficient software packages, these pipelines have some disadvantages. For example, there is a lack of process automation (in some pipelines, each step must be started manually), which significantly reduces the performance of the analysis. In the majority of pipelines, there is no possibility of automatic installation of all necessary software packages; for most of them, it is also impossible to switch off unnecessary or completed steps. In the present work, we have developed a GBS-DP bioinformatics pipeline for GBS data analysis. The pipeline can be applied for various species. The pipeline is implemented using the Snakemake workflow engine. This implementation allows fully automating the process of calculation and installation of the necessary software packages. Our pipeline is able to perform analysis of large datasets (more than 400 samples).
Leaves of many angiosperm species develop trichomes. These epidermal outgrowths have been exploited to the study the determination of cell fate, plant cell differentiation mechanisms and cell ...morphogenesis in the model plant species. It was found that even simple shape trichomes (leaf hairs) offer protection against both biotic and abiotic stress factors. Currently, in monocotyledonous plants, the genetic basis of leaf hair formation is poorly understood. This study sought to establish the genetic control of leaf pubescence formation in bread wheat (Triticum aestivum L.) in terms of leaf hair patterning and growth. A set of cultivars and lines carrying allelic combinations of three pubescence controlling genes, Hl1, Hl3 and Hl2 ᵃᵉˢᵖ was used for quantitative phenotyping. It was demonstrated that these genes differ in their effect on leaf hair formation: Hl1 and Hl3 more affected leaf hair initiation and growth, while Hl2 ᵃᵉˢᵖ modified leaf hair length. Their action was independent to a large extent. A model of Hl1, Hl3 and Hl2 ᵃᵉˢᵖ genes action is proposed.
Phospholipases A2 (PLA2) are capable of hydrolyzing the sn-2 position of glycerophospholipids to release fatty acids and lysophospholipids. The PLA2 superfamily enzymes are widespread and present in ...most mammalian cells and tissues, regulating metabolism, remodeling the membrane and maintaining its homeostasis, producing lipid mediators and activating inflammatory reactions, so disruption of PLA2-regulated lipid metabolism often leads to various diseases. In this study, 29 PLA2 genes in the human genome were systematically collected and described based on literature and sequence analyses. Localization of the PLA2 genes in human genome showed they are placed on 12 human chromosomes, some of them forming clusters. Their RVI scores estimating gene tolerance to the mutations that accumulate in the human population demonstrated that the G4-type PLA2 genes belonging to one of the two largest clusters (4 genes) were most tolerant. On the contrary, the genes encoding G6-type PLA2s (G6B, G6F, G6C, G6A) localized outside the clusters had a reduced tolerance to mutations. Analysis of the association between PLA2 genes and human diseases found in the literature showed 24 such genes were associated with 119 diseases belonging to 18 groups, so in total 229 disease/PLA2 gene relationships were described to reveal that G4, G2 and G7-type PLA2 proteins were involved in the largest number of diseases if compared to other PLA2 types. Three groups of diseases turned out to be associated with the greatest number of PLA2 types: neoplasms, circulatory and endocrine system diseases. Phylogenetic analysis showed that a common origin can be established only for secretory PLA2s (G1, G2, G3, G5, G10 and G12). The remaining PLA2 types (G4, G6, G7, G8, G15 and G16) could be considered evolutionarily independent. Our study has found that the genes most tolerant to PLA2 mutations in humans (G4, G2, and G7 types) belong to the largest number of disease groups.
Analysis of hyperspectral images is of great interest in plant studies. Nowadays, this analysis is used more and more widely, so the development of hyperspectral image processing methods is an urgent ...task. This paper presents a hyperspectral image processing pipeline that includes: preprocessing, basic statistical analysis, visualization of a multichannel hyperspectral image, and solving classification and clustering problems using machine learning methods. The current version of the package implements the following methods: construction of a confidence interval of an arbitrary level for the difference of sample averages; verification of the similarity of intensity distributions of spectral lines for two sets of hyperspectral images on the basis of the Mann–Whitney U-criterion and Pearson’s criterion of agreement; visualization in two-dimensional space using dimensionality reduction methods PCA, ISOMAP and UMAP; classification using linear or ridge regression, random forest and catboost; clustering of samples using the EM-algorithm. The software pipeline is implemented in Python using the Pandas, NumPy, OpenCV, SciPy, Sklearn, Umap, CatBoost and Plotly libraries. The source code is available at: https://github.com/igor2704/Hyperspectral_images. The pipeline was applied to identify melanin pigment in the shell of barley grains based on hyperspectral data. Visualization based on PCA, UMAP and ISOMAP methods, as well as the use of clustering algorithms, showed that a linear separation of grain samples with and without pigmentation could be performed with high accuracy based on hyperspectral data. The analysis revealed statistically significant differences in the distribution of median intensities for samples of images of grains with and without pigmentation. Thus, it was demonstrated that hyperspectral images can be used to determine the presence or absence of melanin in barley grains with great accuracy. The flexible and convenient tool created in this work will significantly increase the efficiency of hyperspectral image analysis.
The pigment composition of plant seed coat affects important properties such as resistance to pathogens, pre-harvest sprouting, and mechanical hardness. The dark color of barley (
Hordeum vulgare
L.) ...grain can be attributed to the synthesis and accumulation of two groups of pigments. Blue and purple grain color is associated with the biosynthesis of anthocyanins. Gray and black grain color is caused by melanin. These pigments may accumulate in the grain shells both individually and together. Therefore, it is difficult to visually distinguish which pigments are responsible for the dark color of the grain. Chemical methods are used to accurately determine the presence/absence of pigments; however, they are expensive and labor-intensive. Therefore, the development of a new method for quickly assessing the presence of pigments in the grain would help in investigating the mechanisms of genetic control of the pigment composition of barley grains. In this work, we developed a method for assessing the presence or absence of anthocyanins and melanin in the barley grain shell based on digital image analysis using computer vision and machine learning algo rithms. A protocol was developed to obtain digital RGB images of barley grains. Using this protocol, a total of 972 images were acquired for 108 barley accessions. Seed coat from these accessions may contain anthocyanins, melanins, or pigments of both types. Chemical methods were used to accurately determine the pigment content of the grains. Four models based on computer vision techniques and convolutional neural networks of different architectures were developed to predict grain pigment composition from images. The U-Net network model based on the EfficientNetB0 topology showed the best performance in the holdout set (the value of the “accuracy” parameter was 0.821).
Proline, an amino acid, plays an important role in plants, and it is involved in stress resistance and development. Earlier, to study the proline role in maintaining stress resistance in plants, we ...obtained genetically modified transgenic lines of tobacco (Nicotiana tabacum L.) with reduced activity of proline dehydrogenase (PDH, the proline degradation gene) and increased content of proline. Transgenic tobacco plants demonstrated greater resistance to high concentrations of NaCl, drought, low temperatures, and heavy metals vs. control plants. The visual assessment showed that the leaf pubescence in transgenic plants varied noticeably. Here we apply automated analysis of the tobacco leaf folds to estimate quantitative characteristics of pubescence in genetically modified tobacco plants and the control SR1 line under non-stress conditions. Our results showed differences in the number of trichomes and their length between transgenic and control plants. The trichome number significantly increased in transgenic plants (from 1.5 to 3 times). The largest differences in the trichome numbers were observed for trichomes with lengths from 0 to 380 µm. When assessing the trichome length, the opposite was observed. In all three transgenic lines, the trichome length was significantly lower than that of the control SR1 line. The data obtained indicate the effect of proline as an important metabolome component affecting the plant phenotype. Our results demonstrate perspectives of tobacco transgenic lines as promising genetic models for studying the proline role in plant morphogenesis.
Determining the quantitative content of chlorophylls in plant leaves by their reflection spectra is an important task both in monitoring the state of natural and industrial phytocenoses, and in ...laboratory studies of normal and
pathological processes during plant growth. The use of machine learning methods for these purposes is promising,
since these methods allow inferring the relationships between input and output variables (prediction model), and
in order to improve the quality of the prediction, a researcher may modify predictors and selects a set of method
parameters. Here, we present the results of the implementation and evaluation of the random forest algorithm for
predicting the total concentration of chlorophylls a and b from the reflection spectra of plant leaves in the visible and
infrared wavelengths. We used the reflection spectra for 276 leaf samples from 39 plant species obtained from open
sources. 181 samples were from the sycamore maple (Acer pseudoplatanus L.). The reflection spectrum represented
wavelengths from 400 to 2500 nm with a step of 1 nm. The training set consisted of the 85 % of A. pseudoplatanus L.
samples, and the performance was evaluated on the remaining 15 % samples of this species (validation sample).
Six models based on the random forest algorithm with different predictors were evaluated. The selection of control
parameters was performed by cross-checking on five partitions. For the first model, the intensity of the reflection
spectra without any transformation was used. Based on the analysis of this model, the optimal ranges of wavelengths
for the remaining five models were selected. The best results were obtained by models that used a two-point estimation of the derivative of the reflection spectrum in the visible wavelength range as input data. We compared one
of these models (the two-point estimation of the derivative of the reflection spectrum in the range of 400–800 nm
with a step of 1 nm) with the model by other authors (which is based on the functional dependence between two
unknown parameters selected by the least squares method and two reflection coefficients, the choice of which is
described in the article). The comparison of the results of predictions of the model based on the random forest algorithm with the model of other authors was carried out both on the validation sample of maple and on the sample
from other plant species. In the first case, the predictions of the method based on a random forest had a lower
estimate of the standard deviation. In the second case, the predictions of this method had a large error for small
values of chlorophyll, while the third-party method had acceptable predictions. The article provides the analysis of
the results, as well as recommendations for using this machine learning method to assess the quantitative content
of chlorophylls in leaves.
The color of the grain shell of cereals is an important feature that characterizes the pigments and metabolites contained in it. The grain shell is the main barrier between the grain and the ...environment, so its characteristics are associated with a number of important biological functions: moisture absorption, grain viability, resistance to pre-harvest germination. The presence of pigments in the shell affects various technological properties of the grain. Color characteristics, as well as the appearance of the grain shell are an important indicator of plant diseases. In addition, the color of the grains serves as a classifying feature of plants. Genetic control of the color formation of both grains and other plant organs is exerted by genes encoding enzymes involved in the biosynthesis of pigments, as well as regulatory genes. For a number of pigments, these genes are well understood, but for some pigments, such as melanin, which causes the black color of grains in barley, the molecular mechanisms of biosynthesis are still poorly understood. When studying the mechanisms of genetic control of grain color, breeders and geneticists are constantly faced with the need to assess the color characteristics of their shell. The technical means of addressing this problem include spectrophotometers, spectrometers, hyperspectral cameras. However, these cameras are expensive, especially with high resolution, both spatial and spectral. An alternative is to use digital cameras that allow you to get high-quality images with high spatial and color resolution. In this regard, recently, in the field of plant phenotyping, methods for evaluating the color and texture characteristics of cereals based on the analysis of two-dimensional images obtained by digital cameras have been intensively developed. This mini-review is devoted to the main tasks related to the analysis of color and texture characteristics of cereals, and to methods of their description based on digital images.
Potato spindle tuber viroid Kochetov, A. V.; Pronozin, A. Y.; Shatskaya, N. V. ...
Vavilovskiĭ zhurnal genetiki i selekt͡s︡ii,
05/2021, Letnik:
25, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Viroids belong to a very interesting class of molecules attracting researchers in phytopathology and molecular evolution. Here we review recent literature data concerning the genetics of Potato ...spindle tuber viroid (PSTVd) and the mechanisms related to its pathological effect on the host plants. PSTVd can be transmitted vertically through microspores and macrospores, but not with pollen from another infected plant. The 359 nucleotidelong genomic RNA of PSTVd is highly structured and its 3D-conformation is responsible for interaction with host cellular factors to mediate replication, transport between tissues during systemic infection and the severity of pathological symptoms. RNA replication is prone to errors and infected plants contain a population of mutated forms of the PSTVd genome. Interestingly, at 7 DAI, only 25 % of the newly synthesized RNAs were identical to the master copy, but this proportion increased to up to 70 % at 14 DAI and remained the same afterwards. PSTVd infection induces the immune response in host plants. There are PSTVd strains with a severe, a moderate or a mild pathological effect. Interestingly, viroid replication itself does not necessarily induce strong morphological or physiological symptoms. In the case of PSTVd, disease symptoms may occur due to RNA-interference, which decreases the expression levels of some important cellular regulatory factors, such as, for example, potato StTCP23 from the gibberellic acid pathway with a role in tuber morphogenesis or tomato FRIGIDA-like protein 3 with an early flowering phenotype. This association between the small segments of viroid genomic RNAs complementary to the untranslated regions of cellular mRNAs and disease symptoms provides a way for new resistant cultivars to be developed by genetic editing. To conclude, viroids provide a unique model to reveal the fundamental features of living systems, which appeared early in evolution and still remain undiscovered.
Recent results suggest that during evolution certain substitutions at protein sites may occur in a coordinated manner due to interactions between amino acid residues. Information on these coordinated ...substitutions may be useful for analysis of protein structure and function. CRASP is an Internet-available software tool for the detection and analysis of coordinated substitutions in multiple alignments of protein sequences. The approach is based on estimation of the correlation coefficient between the values of a physicochemical parameter at a pair of positions of sequence alignment. The program enables the user to detect and analyze pairwise relationships between amino acid substitutions at protein sequence positions, estimate the contribution of the coordinated substitutions to the evolutionary invariance or variability in integral protein physicochemical characteristics such as the net charge of protein residues and hydrophobic core volume. The CRASP program is available at http://wwwmgs.bionet.nsc.ru/mgs/programs/crasp/.