Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell ...receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.
The relationship between tumor immune responses and tumor neoantigens is one of the most fundamental and unsolved questions in tumor immunology, and is the key to understanding the inefficiency of ...immunotherapy observed in many cancer patients. However, the properties of neoantigens that can elicit immune responses remain unclear. This biological problem can be represented and solved under a multiple instance learning framework, which seeks to model multiple instances (neoantigens) within each bag (patient specimen) with the continuous response (T cell infiltration) observed for each bag. To this end, we develop a Bayesian multiple instance regression method, named BMIR, using a Gaussian distribution to address continuous responses and latent binary variables to model primary instances in bags. By means of such Bayesian modeling, BMIR can learn a function for predicting the bag-level responses and for identifying the primary instances within bags, as well as give access to Bayesian statistical inference, which are elusive in existing works. We demonstrate the superiority of BMIR over previously proposed optimization-based methods for multiple instance regression through simulation and real data analyses. Our method is implemented in R package entitled “BayesianMIR” and is available at https://github.com/inmybrain/BayesianMIR.
Abstract
Despite remarkable success in the prevention and treatment of tuberculosis (TB), it remains one of the most devastating infectious diseases worldwide. Management of TB requires an efficient ...and timely diagnostic strategy. In this study, we comprehensively characterized the plasma lipidome of TB patients, then selected candidate lipid and lipid-related gene biomarkers using a data-driven, knowledge-based framework. Among 93 lipids that were identified as potential biomarker candidates, ether-linked phosphatidylcholine (PC O–) and phosphatidylcholine (PC) were generally upregulated, while free fatty acids and triglycerides with longer fatty acyl chains were downregulated in the TB group. Lipid-related gene enrichment analysis revealed significantly altered metabolic pathways (e.g., ether lipid, linolenic acid, and cholesterol) and immune response signaling pathways. Based on these potential biomarkers, TB patients could be differentiated from controls in the internal validation (random forest model, area under the curve AUC 0.936, 95% confidence interval CI 0.865–0.992). PC(O-40:4), PC(O-42:5), PC(36:0), and PC(34:4) were robust biomarkers able to distinguish TB patients from individuals with latent infection and healthy controls, as shown in the external validation. Small changes in expression were identified for 162 significant lipid-related genes in the comparison of TB patients vs. controls; in the random forest model, their utilities were demonstrated by AUCs that ranged from 0.829 to 0.956 in three cohorts. In conclusion, this study introduced a potential framework that can be used to identify and validate metabolism-centric biomarkers.
Insight into the metabolic biosignature of tuberculosis (TB) may inform clinical care, reduce adverse effects, and facilitate metabolism-informed therapeutic development. However, studies often yield ...inconsistent findings regarding the metabolic profiles of TB. Herein, we conducted an untargeted metabolomics study using plasma from 63 Korean TB patients and 50 controls. Metabolic features were integrated with the data of another cohort from China (35 TB patients and 35 controls) for a global functional meta-analysis. Specifically, all features were matched to a known biological network to identify potential endogenous metabolites. Next, a pathway-level gene set enrichment analysis-based analysis was conducted for each study and the resulting p-values from the pathways of two studies were combined. The meta-analysis revealed both known metabolic alterations and novel processes. For instance, retinol metabolism and cholecalciferol metabolism, which are associated with TB risk and outcome, were altered in plasma from TB patients; proinflammatory lipid mediators were significantly enriched. Furthermore, metabolic processes linked to the innate immune responses and possible interactions between the host and the bacillus showed altered signals. In conclusion, our proof-of-concept study indicated that a pathway-level meta-analysis directly from metabolic features enables accurate interpretation of TB molecular profiles.
The advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform ...transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal cancer (CRC). Different random forests (RF)-based feature selection methods including the area under the curve (AUC)-RF, Boruta, and Vita were used and the diagnostic performance of the proposed biosignatures was benchmarked using RF, logistic regression, naïve Bayes, and k-nearest neighbors models. All models showed satisfactory performance in which RF appeared to be the best. For instance, regarding the RF model, the following were observed: mean accuracy 0.998 (standard deviation (SD) < 0.003), mean specificity 0.999 (SD < 0.003), and mean sensitivity 0.998 (SD < 0.004). Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Some biomarkers were found to be enriched in epithelial cell signaling in
infection and inflammatory processes. The overexpression of
and
was associated with poor disease-free survival while the down-regulation of
,
, and
was linked to worse overall survival of the patients. In conclusion, novel transcriptome signatures to improve the diagnostic accuracy in CRC are introduced for further validations in various clinical settings.
Background
The optimal diagnosis and treatment of tuberculosis (TB) are challenging due to underdiagnosis and inadequate treatment monitoring. Lipid-related genes are crucial components of the host ...immune response in TB. However, their dynamic expression and potential usefulness for monitoring response to anti-TB treatment are unclear.
Methodology
In the present study, we used a targeted, knowledge-based approach to investigate the expression of lipid-related genes during anti-TB treatment and their potential use as biomarkers of treatment response.
Results and discussion
The expression levels of 10 genes (
ARPC5
,
ACSL4
,
PLD4
,
LIPA
,
CHMP2B
,
RAB5A
,
GABARAPL2
,
PLA2G4A
,
MBOAT2
, and
MBOAT1
) were significantly altered during standard anti-TB treatment. We evaluated the potential usefulness of this 10-lipid-gene signature for TB diagnosis and treatment monitoring in various clinical scenarios across multiple populations. We also compared this signature with other transcriptomic signatures. The 10-lipid-gene signature could distinguish patients with TB from those with latent tuberculosis infection and non-TB controls (area under the receiver operating characteristic curve > 0.7 for most cases); it could also be useful for monitoring response to anti-TB treatment. Although the performance of the new signature was not better than that of previous signatures (i.e., RISK6, Sambarey10, Long10), our results suggest the usefulness of metabolism-centric biomarkers
Conclusions
Lipid-related genes play significant roles in TB pathophysiology and host immune responses. Furthermore, transcriptomic signatures related to the immune response and lipid-related gene may be useful for TB diagnosis and treatment monitoring.
Black ginseng has various pharmacological activities, but only few studies have compared its pharmacological effects with those of red ginseng. We conducted an integrative systematic literature ...evaluation and developed a non-inferiority test based on the multivariate modeling approach to compare the pharmacological effects of red ginseng and black ginseng. We searched reported studies on the pharmaceutical effects and composition of ginsenosides and assigned numeric scores using nonlinear principal component analysis, based on discretization measures for the included publications. Downstream weighted linear regression models were constructed to study the eight major biological activities that are generally known to be exhibited by red ginseng. Our statistical model, based on available ordinal information gathered from previous literature, helped in comparing the overlapping effects of black ginseng. Black ginseng showed antioxidant effects comparable to those of red ginseng; however, this variant was inferior to red ginseng in enhancing immunity, relieving fatigue, alleviating depression/anxiety, decreasing body fat, and reducing blood pressure. We have showed a cost-efficient method to indirectly evaluate the biological effects of ginseng products using data from published articles. This method can also be used to compare the nutritional and medicinal value of herbal medicines that share similar compositions of bioactive compounds.
Substantial alterations at the multi-omics level of pancreatic cancer (PC) impede the possibility to diagnose and treat patients in early stages. Herein, we conducted an integrative omics-based ...translational analysis, utilizing next-generation sequencing, transcriptome meta-analysis, and immunohistochemistry, combined with statistical learning, to validate multiplex biomarker candidates for the diagnosis, prognosis, and management of PC. Experiment-based validation was conducted and supportive evidence for the essentiality of the candidates in PC were found at gene expression or protein level by practical biochemical methods. Remarkably, the random forests (RF) model exhibited an excellent diagnostic performance and
,
,
, and
greatly influenced its decisions. An explanation approach for the RF model was successfully constructed. Moreover, protein expression of LAMC2, ANXA2, ADAM9, and APLP2 was found correlated and significantly higher in PC patients in independent cohorts. Survival analysis revealed that patients with high expression of
(Hazard ratio (HR)
= 2.2,
-value < 0.001),
(HR
= 2.1,
-value < 0.001), and
(HR
= 1.8,
-value = 0.012) exhibited poorer survival rates. In conclusion, we successfully explore hidden biological insights from large-scale omics data and suggest that
,
,
, and
are robust biomarkers for early diagnosis, prognosis, and management for PC.
Many extensions of the multivariate normal distribution to heavy-tailed distributions are proposed in the literature, which includes scale Gaussian mixture distribution, elliptical distribution, ...generalized elliptical distribution and transelliptical distribution. The inferences for each family of distributions are well studied. However, extensions are overlapped or similar to each other, and it is hard to differentiate one extension from the other. For this reason, in practice, researchers simply pick one of many extensions and apply it to the analysis. In this paper, to enlighten practitioners who should conduct statistical procedures not based on their preferences but based on how data look like, we comparatively review various extensions and their estimators. Also, we fully investigate the inclusion and exclusion relations of different extensions by Venn diagrams and examples. Moreover, in the numerical study, we illustrate visual differences of the extensions by bivariate plots and analyze different scatter matrix estimators based on the microarray data.
Identifying and translating hepatocellular carcinoma (HCC) biomarkers from bench to bedside using mass spectrometry-based metabolomics and lipidomics is hampered by inconsistent findings. Here, we ...investigated HCC at systemic and metabolism-centric multiomics levels by conducting a meta-analysis of quantitative evidence from 68 cohorts. Blood transcript biomarkers linked to the HCC metabolic phenotype were externally validated and prioritized. In the studies under investigation, about 600 metabolites were reported as putative HCC-associated biomarkers; 39, 20, and 10 metabolites and 52, 12, and 12 lipids were reported in three or more studies in HCC vs. Control, HCC vs. liver cirrhosis (LC), and LC vs. Control groups, respectively. Amino acids, fatty acids (increased 18:1), bile acids, and lysophosphatidylcholine were the most frequently reported biomarkers in HCC. BAX and RAC1 showed a good correlation and were associated with poor prognosis. Our study proposes robust HCC biomarkers across diverse cohorts using a data-driven knowledge-based approach that is versatile and affordable for studying other diseases.