The importance of 3D protein structure in proteolytic processing is well known. However, despite the plethora of existing methods for predicting proteolytic sites, only a few of them utilize the ...structural features of potential substrates as predictors. Moreover, to our knowledge, there is currently no method available for predicting the structural susceptibility of protein regions to proteolysis. We developed such a method using data from CutDB, a database that contains experimentally verified proteolytic events. For prediction, we utilized structural features that have been shown to influence proteolysis in earlier studies, such as solvent accessibility, secondary structure, and temperature factor. Additionally, we introduced new structural features, including length of protruded loops and flexibility of protein termini. To maximize the prediction quality of the method, we carefully curated the training set, selected an appropriate machine learning method, and sampled negative examples to determine the optimal positive-to-negative class size ratio. We demonstrated that combining our method with models of protease primary specificity can outperform existing bioinformatics methods for the prediction of proteolytic sites. We also discussed the possibility of utilizing this method for bioinformatics prediction of other post-translational modifications.
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) primarily enters the cell by binding the virus's spike (S) glycoprotein to the angiotensin-converting enzyme 2 receptor on the cell ...surface, followed by proteolytic cleavage by host proteases. Studies have identified furin and transmembrane protease serine 2 proteases in priming and triggering cleavages of the S glycoprotein, converting it into a fusion-competent form and initiating membrane fusion, respectively. Alternatively, SARS-CoV-2 can enter the cell through the endocytic pathway, where activation is triggered by lysosomal cathepsin L. However, other proteases are also suspected to be involved in both entry routes. In this study, we conducted a genome-wide bioinformatics analysis to explore the capacity of human proteases in hydrolyzing peptide bonds of the S glycoprotein. Predictive models of sequence specificity for 169 human proteases were constructed and applied to the S glycoprotein together with the method for predicting structural susceptibility to proteolysis of protein regions. After validating our approach on extensively studied S2' and S1/S2 cleavage sites, we applied our method to each peptide bond of the S glycoprotein across all 169 proteases. Our results indicate that various members of the proprotein convertase subtilisin/kexin type, type II transmembrane family serine protease, and kallikrein families, as well as specific coagulation factors, are capable of cleaving S2' or S1/S2 sites. We have also identified a potential cleavage site of cathepsin L at the K790 position within the S2' loop. Structural analysis suggests that cleavage of this site induces conformational changes similar to the cleavage at the R815 (S2') position, leading to the exposure of the fusion peptide and subsequent fusion with the membrane. Other potential cleavage sites and the influence of mutations in common SARS-CoV-2 variants on proteolytic efficiency are discussed.IMPORTANCEThe entry of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) into the cell, activated by host proteases, is considerably more complex in coronaviruses than in most other viruses and is not fully understood. There is evidence that other proteases beyond the known furin and transmembrane protease serine 2 can activate the spike protein. Another example of uncertainty is the cleavage site for the alternative endocytic route of SARS-CoV-2 entrance, which is still unknown. Bioinformatics methods, modeling protease specificity and estimating the structural susceptibility of protein regions to proteolysis, can aid in studying this topic by predicting the involved proteases and their cleavage sites, thereby substantially reducing the amount of experimental work. Elucidating the mechanisms of spike protein activation is crucial for preventing possible future coronavirus pandemics and developing antiviral drugs.
•Method for classification of the ANA HEp-2 medical test slide images is presented.•To increase robustness of the method both stand-alone and overlapped cells are used.•New method for a separation of ...overlapped cells introduced in the preliminary step.•Cell-level classification is performed using two SVM models.•Slide-level classification relies on a voting scheme with cell-source depended weights.
Correct diagnostics of autoimmune disorders is important for a treatment planning as many of such diseases demonstrate similar symptoms. Owing to ANA HEp-2 medical test it is possible to distinguish among such autoimmune disease as lupus erythematosus, scleroderma and Sjogren’s syndrome. It is known that manual disease classification performing based on analysis of indirect immunofluorescent images obtained in the course of this test is error-prone. Here, we present an automatic method for classification of ANA HEp-2 images that performs separation of the individual cell on pre-segmented slide images and subsequent cell and slide image classification. The method uses morphological properties of the stained patterns located inside the cell nucleus for an individual cell classification and a voting scheme for a classification of the whole ANA HEp-2 slide image.
This paper presents the evaluation results of the methods submitted to Challenge US: Biometric Measurements from Fetal Ultrasound Images, a segmentation challenge held at the IEEE International ...Symposium on Biomedical Imaging 2012. The challenge was set to compare and evaluate current fetal ultrasound image segmentation methods. It consisted of automatically segmenting fetal anatomical structures to measure standard obstetric biometric parameters, from 2D fetal ultrasound images taken on fetuses at different gestational ages (21 weeks, 28 weeks, and 33 weeks) and with varying image quality to reflect data encountered in real clinical environments. Four independent sub-challenges were proposed, according to the objects of interest measured in clinical practice: abdomen, head, femur, and whole fetus. Five teams participated in the head sub-challenge and two teams in the femur sub-challenge, including one team who tackled both. Nobody attempted the abdomen and whole fetus sub-challenges. The challenge goals were two-fold and the participants were asked to submit the segmentation results as well as the measurements derived from the segmented objects. Extensive quantitative (region-based, distance-based, and Bland-Altman measurements) and qualitative evaluation was performed to compare the results from a representative selection of current methods submitted to the challenge. Several experts (three for the head sub-challenge and two for the femur sub-challenge), with different degrees of expertise, manually delineated the objects of interest to define the ground truth used within the evaluation framework. For the head sub-challenge, several groups produced results that could be potentially used in clinical settings, with comparable performance to manual delineations. The femur sub-challenge had inferior performance to the head sub-challenge due to the fact that it is a harder segmentation problem and that the techniques presented relied more on the femur's appearance.
Bioinformatics-based prediction of protease substrates can help to elucidate regulatory proteolytic pathways that control a broad range of biological processes such as apoptosis and blood ...coagulation. The majority of published predictive models are position weight matrices (PWM) reflecting specificity of proteases toward target sequence. These models are typically derived from experimental data on positions of hydrolyzed peptide bonds and show a reasonable predictive power. New emerging techniques that not only register the cleavage position but also measure catalytic efficiency of proteolysis are expected to improve the quality of predictions or at least substantially reduce the number of tested substrates required for confident predictions. The main goal of this study was to develop new prediction models based on such data and to estimate the performance of the constructed models. We used data on catalytic efficiency of proteolysis measured for eight major human matrix metalloproteinases to construct predictive models of protease specificity using a variety of regression analysis techniques. The obtained results suggest that efficiency-based (quantitative) models show a comparable performance with conventional PWM-based algorithms, while less training data are required. The derived list of candidate cleavage sites in human secreted proteins may serve as a starting point for experimental analysis.
Display omitted
•Protease specificity models can be built from quantitative protease profiling data.•Regression methods is applicable for the building of such predictive models.•These models perform at least as well as traditional quantitative PWMs.•Significantly smaller number of peptide substrates are required for training data.
The ANA HEp-2 medical test is a powerful tool in autoimmune disease diagnostics. The last step of this test, the interpretation of immunofluorescent images by trained experts, represents a potential ...source of errors and could theoretically be replaced by automated methods. Here we present a fully automatic method for recognition of types of immunofluorescent images produced by the ANA HEp-2 medical test. The proposed method makes use of the difference in number, size, shape and localization of cell regions that are targeted by the antinuclear antibodies – the humoral components of immune system that bind human antigens as a result of the immune system malfunction. The method extracts morphological properties of stained cell regions using a combination of thresholding-based and thresholding-less approaches and applies a conventional machine-learning algorithm for image classification.
•An automatic method for classification of ANA HEp-2 cells images is proposed.•The proposed method utilizes for recognition a morphological properties of the targeted cell domains.•The method applies both a binarization and threshold-less approaches for feature extraction.
Abstract
Since the discovery of the role of the APOBEC enzymes in human cancers, the mechanisms of this type of mutagenesis remain little understood. Theoretically, targeting of single-stranded DNA ...by the APOBEC enzymes could occur during cellular processes leading to the unwinding of DNA double-stranded structure. Some evidence points to the importance of replication in the APOBEC mutagenesis, while the role of transcription is still underexplored. Here, we analyzed gene expression and whole genome sequencing data from five types of human cancers with substantial APOBEC activity to estimate the involvement of transcription in the APOBEC mutagenesis and compare its impact with that of replication. Using the TCN motif as the mutation signature of the APOBEC enzymes, we observed a correlation of active APOBEC mutagenesis with gene expression, confirmed the increase of APOBEC-induced mutations in early-replicating regions and estimated the relative impact of transcription and replication on the APOBEC mutagenesis. We also found that the known effect of higher density of APOBEC-induced mutations on the lagging strand was highest in middle-replicating regions and observed higher APOBEC mutation density on the sense strand, the latter bias positively correlated with the gene expression level.
While somatic mutations are known to be enriched in genome regions with non-canonical DNA secondary structure, the impact of particular mutagens still needs to be elucidated. Here, we demonstrate ...that in human cancers, the APOBEC mutagenesis is not enriched in direct repeats, mirror repeats, short tandem repeats, and G-quadruplexes, and even decreased below its level in B-DNA for cancer samples with very high APOBEC activity. In contrast, we observe that the APOBEC-induced mutational density is positively associated with APOBEC activity in inverted repeats (cruciform structures), where the impact of cytosine at the 3’-end of the hairpin loop is substantial. Surprisingly, the APOBEC-signature mutation density per TC motif in the single-stranded DNA of a G-quadruplex (G4) is lower than in the four-stranded part of G4 and in B-DNA. The APOBEC mutagenesis, as well as the UV-mutagenesis in melanoma samples, are absent in Z-DNA regions, owing to the depletion of their mutational signature motifs.
Display omitted
•APOBEC mutagenesis is not enriched in most non-canonical DNA structures•Inverted repeats (cruciform structures) show increased APOBEC mutagenesis•G-quadruplex’s unstructured strand has low APOBEC-induced mutation density•Decrease of APOBEC mutagenesis in non-B DNA possibly associated with PrimPol
Cancer; Cancer mutagenesis.