Objective: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of "Big Data" for healthcare or public health ...purposes. Methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets. Then a variety of unsupervised and supervised methods were used (eg, Sammon mapping and extreme Gradient Boosting) to train models on datasets of differing data quality, which reflects the varying availability and potential rarity of ground truth data in medical databases. Results: Resulting machine learning model accuracies ranged between 47.7% and 94.4% when tested on ground truth data. Finally, we found new features to help experts enhance existing case-finding algorithms. Discussion: Developing precise algorithms to detect medical outcomes in claims can be a laborious and expensive process, particularly for conditions presented and coded diversely. We found it beneficial to filter out highly potent codes used for data curation to identify underlying patterns and features. To improve rule-based algorithms where necessary, researchers could use model explainers to determine noteworthy features, which could then be shared with experts and included in the algorithm. Conclusion: Our work suggests machine learning models can perform at similar levels as a previously published expert case-finding algorithm, while also having the potential to improve performance or streamline algorithm construction processes by identifying new relevant features for algorithm construction. Lay Summary Electronic health records and medical claims data are a potential treasure trove for identifying the new underlying content and confirming the existing knowledge base. However, whenever researchers introduce screening criteria in the data curation process, they will also introduce bias if they are not careful. Therefore, it is crucial to consider what information can go into machine learning models. In this work, we show how we used feature elimination and feature selection to replicate the success of human expert-defined anaphylaxis identification models. We then used common and essential features between minimally curated and expert-defined datasets to create a new machine-learning model that can beat the human expert-defined algorithms. This process can be repeated and automated to iteratively develop better models and features, which can help healthcare practitioners design more successful case-defining algorithms. Key words: anaphylaxis; machine learning; public health; allergy; electronic health records; Centers for Medicare and Medicaid Services.
The AGMK1-9T7 cell line has been used to study neoplasia in tissue culture. By passage in cell culture, these cells evolved to become tumorigenic and metastatic in immunodeficient mice at passage 40. ...Of the 20 x 106 kidney cells originally plated, less than 2% formed the colonies that evolved to create this cell line. These cells could be the progeny of some type of kidney progenitor cells. To characterize these cells, we documented their renal lineage by their expression of PAX-2 and MIOX, detected by indirect immunofluorescence. These cells assessed by flow-cytometry expressed high levels of CD44, CD73, CD105, Sca-1, and GLI1 across all passages tested; these markers have been reported to be expressed by renal progenitor cells. The expression of GLI1 was confirmed by immunofluorescence and western blot analysis. Cells from passages 13 to 23 possessed the ability to differentiate into adipocytes, osteoblasts, and chondrocytes; after passage 23, their ability to form these cell types was lost. These data indicate that the cells that formed the AGMK1-9T7 cell line were GLI1+ perivascular, kidney, progenitor cells.
Human association studies of common genetic polymorphisms have identified many loci that are associated with risk of complex diseases, although individual loci typically have small effects. However, ...by envisaging genetic associations in terms of cellular pathways, rather than any specific polymorphism, combined effects of many biologically relevant alleles can be detected. The effects are likely to be most apparent in investigations of phenotypically homogenous subtypes of complex diseases. We report findings from a case-control, genetic association study of relationships between 2925 single nucleotide polymorphisms (SNPs) and 2 subtypes of a commonly occurring chronic facial pain condition, temporomandibular disorder (TMD): 1) localized TMD and 2) TMD with widespread pain. When compared to healthy controls, cases with localized TMD differed in allelic frequency of SNPs that mapped to a serotonergic receptor pathway (P=0.0012), while cases of TMD with widespread pain differed in allelic frequency of SNPs that mapped to a T-cell receptor pathway (P=0.0014). A risk index representing combined effects of 6 SNPs from the serotonergic pathway was associated with greater odds of localized TMD (odds ratio 2.7, P=1.3 E-09), and the result was reproduced in a replication case-control cohort study of 639 people (odds ratio 1.6, P=0.014). A risk index representing combined effects of 8 SNPs from the T-cell receptor pathway was associated with greater odds of TMD with widespread pain (P=1.9 E-08), although the result was not significant in the replication cohort. These findings illustrate potential for clinical classification of chronic pain based on distinct molecular profiles and genetic background.
The pathogenesis of multiple sclerosis (MS) involves alterations to multiple pathways and processes, which represent a significant challenge for developing more-effective therapies. Systems biology ...approaches that study pathway dysregulation should offer benefits by integrating molecular networks and dynamic models with current biological knowledge for understanding disease heterogeneity and response to therapy. In MS, abnormalities have been identified in several cytokine-signaling pathways, as well as those of other immune receptors. Among the downstream molecules implicated are Jak/Stat, NF-Kb, ERK1/3, p38 or Jun/Fos. Together, these data suggest that MS is likely to be associated with abnormalities in apoptosis/cell death, microglia activation, blood-brain barrier functioning, immune responses, cytokine production, and/or oxidative stress, although which pathways contribute to the cascade of damage and can be modulated remains an open question. While current MS drugs target some of these pathways, others remain untouched. Here, we propose a pragmatic systems analysis approach that involves the large-scale extraction of processes and pathways relevant to MS. These data serve as a scaffold on which computational modeling can be performed to identify disease subgroups based on the contribution of different processes. Such an analysis, targeting these relevant MS-signaling pathways, offers the opportunity to accelerate the development of novel individual or combination therapies.
Genetic suppressor elements (GSEs) are short biologically active gene fragments that encode dominantly acting peptides or inhibitory antisense RNAs. GSEs can be isolated from a single gene or from a ...multigene complex by constructing a library of short random fragments of the target gene(s) in an expression vector, followed by expression selection for the desired phenotype in a suitable cellular system. GSE selection from a single gene allows one to develop efficient and specific inhibitors of the gene function and to identify functional protein domains. GSE selection from a multigene complex, such as a normalized (uniform abundance) cDNA population from mammalian cells, makes it possible to identify genes that are involved in selectable cellular phenotypes. The potential of GSE selection for uncovering novel gene functions was first demonstrated using bacteriophage lambda as a model system. GSE selection in retroviral expression vectors has been applied in mammalian cells to identify genes responsible for sensitivity to etoposide and other chemotherapeutic drugs. GSE selection is also useful for cloning and analysis of tumor suppressor genes and can be applied to identifying tumor-specific targets for future anticancer drugs. Investigators should find this experimental strategy applicable to many different areas of medical and biological research.
Molecular networks in microarray analysis Sivachenko, Andrey Y; Yuryev, Anton; Daraselia, Nikolai ...
Journal of bioinformatics and computational biology,
04/2007, Letnik:
5, Številka:
2B
Journal Article
Recenzirano
Microarray-based characterization of tissues, cellular and disease states, and environmental condition and treatment responses provides genome-wide snapshots containing large amounts of invaluable ...information. However, the lack of inherent structure within the data and strong noise make extracting and interpreting this information and formulating and prioritizing domain relevant hypotheses difficult tasks. Integration with different types of biological data is required to place the expression measurements into a biologically meaningful context. A few approaches in microarray data interpretation are discussed with the emphasis on the use of molecular network information. Statistical procedures are demonstrated that superimpose expression data onto the transcription regulation network mined from scientific literature and aim at selecting transcription regulators with significant patterns of expression changes downstream. Tests are suggested that take into account network topology and signs of transcription regulation effects. The approaches are illustrated using two different expression datasets, the performance is compared, and biological relevance of the predictions is discussed.
Non-small cell lung cancer (NSCLC) is the most common type of lung cancer, with a poor response to chemotherapy and low survival rate. This unfavorable treatment response is likely to derive from ...both late diagnosis and from complex, incompletely understood biology, and heterogeneity among NSCLC subtypes. To define the relative contributions of major cellular pathways to the biogenesis of NSCLC and highlight major differences between NSCLC subtypes, we studied the molecular signatures of lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC), based on analysis of gene expression and comparison of tumor samples with normal lung tissue. Our results suggest the existence of specific molecular networks and subtype-specific differences between lung ADC and SCC subtypes, mostly found in cell cycle, DNA repair, and metabolic pathways. However, we also observed similarities across major gene interaction networks and pathways in ADC and SCC. These data provide a new insight into the biology of ADC and SCC and can be used to explore novel therapeutic interventions in lung cancer chemoprevention and treatment.