Genome-wide association studies (GWAS) play a vital role in identifying important genes those is associated with the phenotypic variations of living organisms. There are several statistical methods ...for GWAS including the linear mixed model (LMM) which is popular for addressing the challenges of hidden population stratification and polygenic effects. However, most of these methods including LMM are sensitive to phenotypic outliers that may lead the misleading results. To overcome this problem, in this paper, we proposed a way to robustify the LMM approach for reducing the influence of outlying observations using the β-divergence method. The performance of the proposed method was investigated using both synthetic and real data analysis. Simulation results showed that the proposed method performs better than both linear regression model (LRM) and LMM approaches in terms of powers and false discovery rates in presence of phenotypic outliers. On the other hand, the proposed method performed almost similar to LMM approach but much better than LRM approach in absence of outliers. In the case of real data analysis, our proposed method identified 11 SNPs that are significantly associated with the rice flowering time. Among the identified candidate SNPs, some were involved in seed development and flowering time pathways, and some were connected with flower and other developmental processes. These identified candidate SNPs could assist rice breeding programs effectively. Thus, our findings highlighted the importance of robust GWAS in identifying candidate genes.
The pandemic of COVID-19 is a severe threat to human life and the global economy. Despite the success of vaccination efforts in reducing the spread of the virus, the situation remains largely ...uncontrolled due to the random mutation in the RNA sequence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which demands different variants of effective drugs. Disease-causing gene-mediated proteins are usually used as receptors to explore effective drug molecules. In this study, we analyzed two different RNA-Seq and one microarray gene expression profile datasets by integrating EdgeR, LIMMA, weighted gene co-expression network and robust rank aggregation approaches, which revealed SARS-CoV-2 infection causing eight hub-genes (HubGs) including HubGs; REL, AURKA, AURKB, FBXL3, OAS1, STAT4, MMP2 and IL6 as the host genomic biomarkers. Gene Ontology and pathway enrichment analyses of HubGs significantly enriched some crucial biological processes, molecular functions, cellular components and signaling pathways that are associated with the mechanisms of SARS-CoV-2 infections. Regulatory network analysis identified top-ranked 5 TFs (SRF, PBX1, MEIS1, ESR1 and MYC) and 5 miRNAs (hsa-miR-106b-5p, hsa-miR-20b-5p, hsa-miR-93-5p, hsa-miR-106a-5p and hsa-miR-20a-5p) as the key transcriptional and post-transcriptional regulators of HubGs. Then, we conducted a molecular docking analysis to determine potential drug candidates that could interact with HubGs-mediated receptors. This analysis resulted in the identification of top-ranked ten drug agents, including Nilotinib, Tegobuvir, Digoxin, Proscillaridin, Olysio, Simeprevir, Hesperidin, Oleanolic Acid, Naltrindole and Danoprevir. Finally, we investigated the binding stability of the top-ranked three drug molecules Nilotinib, Tegobuvir and Proscillaridin with the three top-ranked proposed receptors (AURKA, AURKB, OAS1) by using 100 ns MD-based MM-PBSA simulations and observed their stable performance. Therefore, the findings of this study might be useful resources for diagnosis and therapies of SARS-CoV-2 infections.
Statistical data-mining (DM) and machine learning (ML) are promising tools to assist in the analysis of complex dataset. In recent decades, in the precision of agricultural development, plant ...phenomics study is crucial for high-throughput phenotyping of local crop cultivars. Therefore, integrated or a new analytical approach is needed to deal with these phenomics data. We proposed a statistical framework for the analysis of phenomics data by integrating DM and ML methods. The most popular supervised ML methods; Linear Discriminant Analysis (LDA), Random Forest (RF), Support Vector Machine with linear (SVM-l) and radial basis (SVM-r) kernel are used for classification/prediction plant status (stress/non-stress) to validate our proposed approach. Several simulated and real plant phenotype datasets were analyzed. The results described the significant contribution of the features (selected by our proposed approach) throughout the analysis. In this study, we showed that the proposed approach removed phenotype data analysis complexity, reduced computational time of ML algorithms, and increased prediction accuracy.
RNA interference (RNAi) plays key roles in post-transcriptional and chromatin modification levels as well as regulates various eukaryotic gene expressions which are involved in stress responses, ...development and maintenance of genome integrity during developmental stages. The whole mechanism of RNAi pathway is directly involved with the gene-silencing process by the interaction of Dicer-Like (DCL), Argonaute (AGO) and RNA-dependent RNA polymerase (RDR) gene families and their regulatory elements. However, these RNAi gene families and their sub-cellular locations, functional pathways and regulatory components were not extensively investigated in the case of economically and nutritionally important fruit plant sweet orange (Citrus sinensis L.). Therefore, in silico characterization, gene diversity and regulatory factor analysis of RNA silencing genes in C. sinensis were conducted by using the integrated bioinformatics approaches. Genome-wide comparison analysis based on phylogenetic tree approach detected 4 CsDCL, 8 CsAGO and 4 CsRDR as RNAi candidate genes in C. sinensis corresponding to the RNAi genes of model plant Arabidopsis thaliana. The domain and motif composition and gene structure analyses for all three gene families exhibited almost homogeneity within the same group members. The Gene Ontology enrichment analysis clearly indicated that the predicted genes have direct involvement into the gene-silencing and other important pathways. The key regulatory transcription factors (TFs) MYB, Dof, ERF, NAC, MIKC_MADS, WRKY and bZIP were identified by their interaction network analysis with the predicted genes. The cis-acting regulatory elements associated with the predicted genes were detected as responsive to light, stress and hormone functions. Furthermore, the expressed sequence tag (EST) analysis showed that these RNAi candidate genes were highly expressed in fruit and leaves indicating their organ specific functions. Our genome-wide comparison and integrated bioinformatics analyses provided some necessary information about sweet orange RNA silencing components that would pave a ground for further investigation of functional mechanism of the predicted genes and their regulatory factors.
This study focuses on investigating the performance of different machine learning algorithms and corresponding comparative analysis in predicting cardiovascular disease. Globally this fatal disease ...causes a plethora of mortality among mankind and so, machine learning algorithms can play a significant role in early detection which will ensure proper treatment for the patients and reduce severity in many cases. The University of California, Irvine (UCI) data repository is utilized for the training and testing of the model. Twelve machine learning algorithms were studied and the performances were observed for default hyperparameter (DHP), grid search cross validation (GSCV) and random search cross validation (RSCV) method. Moreover, computational time were also calculated for both GSCV and RSCV. An accuracy of 92% has been found in both hard and soft voting ensemble classifiers (EVCH and EVCS). However, it observed that Adaboost algorithm outperforms EVCH and EVCS in terms of precision and specificity . Hence, the overall comparative analyses among all the algorithms are carried out extensively where accuracy, precision, sensitivity, specificity, F1 score, and ROC-AUC are brought into action. Jupyter notebook 6.0.3 is utilized for simulation.
Dicer-like (DCL), Argonaute (AGO), and RNA-dependent RNA polymerase (RDR) are known as the three major gene families that act as the critical components of RNA interference or silencing mechanisms ...through the noncoding small RNA molecules (miRNA and siRNA) to regulate the expressions of protein-coding genes in eukaryotic organisms. However, most of their characteristics including structures, chromosomal location, subcellular locations, regulatory elements, and gene networking were not rigorously studied. Our analysis identified 7 TaDCL, 39 TaAGO, and 16 TaRDR genes as RNA interference (RNAi) genes from the wheat genome. Phylogenetic analysis of predicted RNAi proteins with the RNAi proteins of Arabidopsis and rice showed that the predicted proteins of TaDCL, TaAGO, and TaRDR groups are clustered into four, eight, and four subgroups, respectively. Domain, 3D protein structure, motif, and exon-intron structure analyses showed that these proteins conserve identical characteristics within groups and maintain differences between groups. The nonsynonymous/synonymous mutation ratio Ka/Ks<1 suggested that these protein sequences conserve some purifying functions. RNAi genes networking with TFs revealed that ERF, MIKC-MADS, C2H2, BBR-BPC, MYB, and Dof are the key transcriptional regulators of the predicted RNAi-related genes. The cis-regulatory element (CREs) analysis detected some important CREs of RNAi genes that are significantly associated with light, stress, and hormone responses. Expression analysis based on an online database exhibited that almost all of the predicted RNAi genes are expressed in different tissues and organs. A case-control study from the gene expression level showed that some RNAi genes significantly responded to the drought and heat stresses. Overall results would therefore provide an excellent basis for in-depth molecular investigation of these genes and their regulatory elements for wheat crop improvement against different stressors.
INTRODUCTION: Chronic Kidney Disease refers to the slow, progressive deterioration of kidney functions. However, the impairment is irreversible and imperceptible up until the disease reaches one of ...the later stages, demanding early detection and initiation of treatment in order to ensure a good prognosis and prolonged life. In this aspect, machine learning algorithms have proven to be promising, and points towards the future of disease diagnosis. OBJECTIVES: We aim to apply different machine learning algorithms for the purpose of assessing and comparing their accuracies and other performance parameters for the detection of chronic kidney disease. METHODS: The ‘chronic kidney disease dataset’ from the machine learning repository of University of California, Irvine, has been harnessed, and eight supervised machine learning models have been developed by utilizing the python programming language for the detection of the disease. RESULTS: A comparative analysis is portrayed among eight machine learning models by evaluating different performance parameters like accuracy, precision, sensitivity, F1 score and ROC-AUC. Among the models, Random Forest displayed the highest accuracy of 99.75%. CONCLUSION: We observed that machine learning algorithms can contribute significantly to the domain of predictive analysis of chronic kidney disease, and can assist in developing a robust computer-aided diagnosis system to aid the healthcare professionals in treating the patients properly and efficiently.
Outbreaks of COVID-19 caused by the novel coronavirus SARS-CoV-2 is still a threat to global human health. In order to understand the biology of SARS-CoV-2 and developing drug against COVID-19, a ...vast amount of genomic, proteomic, interatomic, and clinical data is being generated, and the bioinformatics researchers produced databases, webservers and tools to gather those publicly available data and provide an opportunity of analyzing such data. However, these bioinformatics resources are scattered and researchers need to find them from different resources discretely. To facilitate researchers in finding the resources in one frame, we have developed an integrated web portal called OverCOVID (
). The publicly available webservers, databases and tools associated with SARS-CoV-2 have been incorporated in the resource page. In addition, a network view of the resources is provided to display the scope of the research. Other information like SARS-CoV-2 strains is visualized and various layers of interaction resources is listed in distinct pages of the web portal. As an integrative web portal, the OverCOVID will help the scientist to search the resources and accelerate the clinical research of SARS-CoV-2.
Biomass is an important phenotypic trait in functional ecology and growth analysis. The typical methods for measuring biomass are destructive, and they require numerous individuals to be cultivated ...for repeated measurements. With the advent of image-based high-throughput plant phenotyping facilities, non-destructive biomass measuring methods have attempted to overcome this problem. Thus, the estimation of plant biomass of individual plants from their digital images is becoming more important. In this paper, we propose an approach to biomass estimation based on image derived phenotypic traits. Several image-based biomass studies state that the estimation of plant biomass is only a linear function of the projected plant area in images. However, we modeled the plant volume as a function of plant area, plant compactness, and plant age to generalize the linear biomass model. The obtained results confirm the proposed model and can explain most of the observed variance during image-derived biomass estimation. Moreover, a small difference was observed between actual and estimated digital biomass, which indicates that our proposed approach can be used to estimate digital biomass accurately.