Protein structures are flexible and often show conformational changes upon binding to other molecules to exert biological functions. As protein structures correlate with characteristic functions, ...structure comparison allows classification and prediction of proteins of undefined functions. However, most comparison methods treat proteins as rigid bodies and cannot retrieve similarities of proteins with large conformational changes effectively.
In this paper, we propose a novel descriptor, local average distance (LAD), based on either the geodesic distances (GDs) or Euclidean distances (EDs) for pairwise flexible protein structure comparison. The proposed method was compared with 7 structural alignment methods and 7 shape descriptors on two datasets comprising hinge bending motions from the MolMovDB, and the results have shown that our method outperformed all other methods regarding retrieving similar structures in terms of precision-recall curve, retrieval success rate, R-precision, mean average precision and F1-measure.
Both ED- and GD-based LAD descriptors are effective to search deformed structures and overcome the problems of self-connection caused by a large bending motion. We have also demonstrated that the ED-based LAD is more robust than the GD-based descriptor. The proposed algorithm provides an alternative approach for blasting structure database, discovering previously unknown conformational relationships, and reorganizing protein structure classification.
•A novel transcriptomic data normalization method based on housekeeping genes.•The housekeeping genes are selected by GO distance and stability analysis.•Normalization results showed the proposed ...method outperformed traditional approaches.•A web-based online system is available for 12 model species.
RNA-seq analysis provides a powerful tool for revealing relationships between gene expression level and biological function of proteins. In order to identify differentially expressed genes among various RNA-seq datasets obtained from different experimental designs, an appropriate normalization method for calibrating multiple experimental datasets is the first challenging problem. We propose a novel method to facilitate biologists in selecting a set of suitable housekeeping genes for inter-sample normalization. The approach is achieved by adopting user defined experimentally related keywords, GO annotations, GO term distance matrices, orthologous housekeeping gene candidates, and stability ranking of housekeeping genes. By identifying the most distanced GO terms from query keywords and selecting housekeeping gene candidates with low coefficients of variation among different spatio-temporal datasets, the proposed method can automatically enumerate a set of functionally irrelevant housekeeping genes for pratical normalization. Novel and benchmark testing RNA-seq datasets were applied to demostrate that different selections of housekeeping gene lead to strong impact on differential gene expression analysis, and compared results have shown that our proposed method outperformed other traditional approaches in terms of both sensitivity and specificity. The proposed mechanism of selecting appropriate houskeeping genes for inter-dataset normalization is robust and accurate for differential expression analyses.
A novel contrast enhancement algorithm is proposed. The proposed approach enhances the contrast without losing the original histogram characteristics, which is based on the histogram specification ...technique. It is expected to eliminate the annoying side effects effectively by using the differential information from the input histogram. The experimental results show that the proposed dynamic histogram specification (DHS) algorithm not only keeps the original histogram shape features but also enhances the contrast effectively. Moreover, the DHS algorithm can be applied by simple hardware and processed in real-time system due to its simplicity.
Background: Coronary artery disease (CAD) is one of the most representative cardiovascular diseases. Early and accurate prediction of CAD based on physiological measurements can reduce the risk of ...heart attack through medicine therapy, healthy diet, and regular physical activity. Methods:Four heart disease datasets from the UC Irvine Machine Learning Repository were combined and re-examined to remove incomplete entries, and a total of 822 cases were utilized in this study. Seven machine learning methods, including Naïve Bayes, artificial neural networks (ANNs), sequential minimal optimization (SMO), k-nearest neighbor (KNN), AdaBoost, J48, and random forest, were adopted to analyze the collected datasets for CAD prediction. By combining co-expressed observations and an ensemble voting mechanism, we designed and evaluated a new medical decision classifier for CAD prediction. The TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution) algorithm was applied to determine the best prediction method for CAD diagnosis. Results: Features of systolic blood pressure, cholesterol, heart rate, and ST depression are considered to be the most significant differences between patients with and without CADs. We show that the prediction capability of seven machine learning classifiers can be enhanced by integrating combinations of observed co-expressed features. Finally, compared to the use of any single classifier, the proposed voting mechanism achieved optimal performance according to TOPSIS.
► Insertion of L1 and Alu elements masks upstream similarity of
CYP11B1 and
CYP11B2. ► A truncated L1 element is oppositely transcribed from the
CYP11B1 promoter. ► Alu element acts as an enhancer to
...CYP11B1 and
CYP11B2. ► Conserved Ad5 and SF-1 sites are important for basal expression of
CYP11B genes. ► ERRα is the Ad5-binding protein responsible for basal transcription.
CYP11B1 and
CYP11B2 responsible for the final steps of cortisol and aldosterone synthesis, respectively, are believed to be duplicate genes with distinctive promoters. Our sequence analysis uncovers that these two genes share great homology in the proximal upstream regions, but insertion of Alu and L1 elements drives promoters divergent. Each
CYP11B promoter contains two Alu elements embedded in a truncated L1 element, breaking L1 into three disconnected fragments. Alu functions as an enhancer in both genes regardless of orientation and copy number. Insertion of Alu upstream of a SV40 promoter also elevates promoter activity. However, the effect of Alu on
CYP11B1 is blocked by a second L1 element (CYP11B1-L1.2) inserted between the first one and the conserved proximal upstream region. Although CYP11B1-L1.2 is 5′-truncated and lacks a functional ORF, replacing it with a fluorescent gene demonstrates that the element can be transcribed from the
CYP11B1 core promoter in an opposite direction and a smaller magnitude compared to
CYP11B1. Deletion of CYP11B1-L1.2 greatly increases
CYP11B1 promoter activity and restores the enhancing effect of Alu. The Ad5 and SF-1 binding elements conserved in the proximal core promoter play a role in basal expression of both genes. Mutation of the Ad5 site reduces promoter activity to the minimal level. ERRα is the transcription factor interacting with Ad5 during basal expression. The core promoters of both genes are also conserved in mouse and rat despite the fact that the sites corresponding to cre, Ad5, and SF-1 in rodent
Cyp11b1 promoters deviate from consensus.
Adaptation of enzymes in a metabolic pathway can occur not only through changes in amino acid sequences but also through variations in transcriptional activation, mRNA splicing and mRNA translation. ...The heme biosynthesis pathway, a linear pathway comprised of eight consecutive enzymes in animals, provides researchers with ample information for multiple types of evolutionary analyses performed with respect to the position of each enzyme in the pathway. Through bioinformatics analysis, we found that the protein-coding sequences of all enzymes in this pathway are under strong purifying selection, from cnidarians to mammals. However, loose evolutionary constraints are observed for enzymes in which self-catalysis occurs. Through comparative genomics, we found that in animals, the first intron of the enzyme-encoding genes has been co-opted for transcriptional activation of the genes in this pathway. Organisms sense the cellular content of iron, and through iron-responsive elements in the 5' untranslated regions of mRNAs and the intron-exon boundary regions of pathway genes, translational inhibition and exon choice in enzymes may be enabled, respectively. Pathway product (heme)-mediated negative feedback control can affect the transport of pathway enzymes into the mitochondria as well as the ubiquitin-mediated stability of enzymes. Remarkably, the positions of these controls on pathway activity are not ubiquitous but are biased towards the enzymes in the upstream portion of the pathway. We revealed that multiple-level controls on the activity of the heme biosynthesis pathway depend on the linear depth of the enzymes in the pathway, indicating a new strategy for discovering the molecular constraints that shape the evolution of a metabolic pathway.
Proper orientation of a molecular structure in three-dimensional (3D) printing could increase successful printing rates, reduce the amount of supporting material required, and shorten printing time. ...In traditional approaches, manual adjustment of a target object for its optimal orientation is inefficient and inconsistent and often requires several trials. Hence, manually and visually rotated results for molecular protein structures with complex, intertwined, rugged, and asymmetric surfaces are not satisfactory. In this study, we apply a grid-based principal component analysis (GPCA) method for an automatic object orientation prior to the physical printing stage. First, a down-sampled three-dimensional protein structure is constructed by lattice-space simulations, and then the proposed GPCA technology is applied to identify possible plane candidates with the largest projection area. Second, a vertical flipping operation is performed and evaluated for a smaller buried volume. Finally, the orientation of the rotated object is iteratively inspected and modified with subtle angle changes in order to further reduce the required supporting material. Several testing cases were used to illustrate the superior performance of the proposed algorithm. Specifically, 140 representative protein structures categorized into seven different groups were selected from the well-known Structural Classification of Proteins—extended database. As the results show, the protein structures were theoretically and heuristically rotated to their optimal orientations for the corresponding 3D printings. The proposed automatic orientation procedure could reduce 38.15% of the required supporting material on average. Furthermore, the expected printing time could be reduced by an average of 17.2 min for small-scale protein structures.
Identification of mutations in patients with amyotrophic lateral sclerosis (ALS) in a genome-wide association study can reveal possible biomarkers of such a rapidly progressive and fatal ...neurodegenerative disease. It was observed that significant single nucleotide polymorphisms vary when the tested population changes from one ethnic group to another. To identify new loci associated with ALS susceptibility in the Taiwanese Han population, we performed a genome-wide association study on 94 patients with sporadic ALS and 376 matched controls. We uncovered two new susceptibility loci at 13q14.3 (rs2785946) and 11q25 (rs11224052). In addition, we analyzed the functions of all the associated genes among 54 significant single nucleotide polymorphisms using Gene Ontology annotations, and the results showed several statistically significant neural- and muscle-related Gene Ontology terms and the associated diseases.
Short tandem repeats (STRs) are abundant in human genomes. Numerous STRs have been shown to be associated with genetic diseases and gene regulatory functions, and have been selected as genetic ...markers for evolutionary and forensic analyses. High-throughput next generation sequencers have fostered new cutting-edge computing techniques for genome-scale analyses, and cross-genome comparisons have facilitated the efficient identification of polymorphic STR markers for various applications.
An automated and efficient system for detecting human polymorphic STRs at the genome scale is proposed in this study. Assembled contigs from next generation sequencing data were aligned and calibrated according to selected reference sequences. To verify identified polymorphic STRs, human genomes from the 1000 Genomes Project were employed for comprehensive analyses, and STR markers from the Combined DNA Index System (CODIS) and disease-related STR motifs were also applied as cases for evaluation. In addition, we analyzed STR variations for highly conserved homologous genes and human-unique genes. In total 477 polymorphic STRs were identified from 492 human-unique genes, among which 26 STRs were retrieved and clustered into three different groups for efficient comparison.
We have developed an online system that efficiently identifies polymorphic STRs and provides novel distinguishable STR biomarkers for different levels of specificity. Candidate polymorphic STRs within a personal genome could be easily retrieved and compared to the constructed STR profile through query keywords, gene names, or assembled contigs.