COPD is a leading cause of mortality.
We hypothesized that applying machine learning to clinical and quantitative CT imaging features would improve mortality prediction in COPD.
We selected 30 ...clinical, spirometric, and imaging features as inputs for a random survival forest. We used top features in a Cox regression to create a machine learning mortality prediction (MLMP) in COPD model and also assessed the performance of other statistical and machine learning models. We trained the models in subjects with moderate to severe COPD from a subset of subjects in Genetic Epidemiology of COPD (COPDGene) and tested prediction performance in the remainder of individuals with moderate to severe COPD in COPDGene and Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE). We compared our model with the BMI, airflow obstruction, dyspnea, exercise capacity (BODE) index; BODE modifications; and the age, dyspnea, and airflow obstruction index.
We included 2,632 participants from COPDGene and 1,268 participants from ECLIPSE. The top predictors of mortality were 6-min walk distance, FEV1 % predicted, and age. The top imaging predictor was pulmonary artery-to-aorta ratio. The MLMP-COPD model resulted in a C index ≥ 0.7 in both COPDGene and ECLIPSE (6.4- and 7.2-year median follow-ups, respectively), significantly better than all tested mortality indexes (P < .05). The MLMP-COPD model had fewer predictors but similar performance to that of other models. The group with the highest BODE scores (7-10) had 56% mortality, whereas the highest mortality group defined by the MLMP-COPD model had 62% mortality (P = .046).
An MLMP-COPD model outperformed four existing models for predicting all-cause mortality across two COPD cohorts. Performance of machine learning was similar to that of traditional statistical methods. The model is available online at: https://cdnm.shinyapps.io/cgmortalityapp/.
Abstract
Motivation
Biological networks can provide a system-level understanding of underlying processes. In many contexts, networks have a high degree of modularity, i.e. they consist of subsets of ...nodes, often known as subnetworks or modules, which are highly interconnected and may perform separate functions. In order to perform subsequent analyses to investigate the association between the identified module and a variable of interest, a module summarization, that best explains the module’s information and reduces dimensionality is often needed. Conventional approaches for obtaining network representation typically rely only on the profiles of the nodes within the network while disregarding the inherent network topological information.
Results
In this article, we propose NetSHy, a hybrid approach which is capable of reducing the dimension of a network while incorporating topological properties to aid the interpretation of the downstream analyses. In particular, NetSHy applies principal component analysis (PCA) on a combination of the node profiles and the well-known Laplacian matrix derived directly from the network similarity matrix to extract a summarization at a subject level. Simulation scenarios based on random and empirical networks at varying network sizes and sparsity levels show that NetSHy outperforms the conventional PCA approach applied directly on node profiles, in terms of recovering the true correlation with a phenotype of interest and maintaining a higher amount of explained variation in the data when networks are relatively sparse. The robustness of NetSHy is also demonstrated by a more consistent correlation with the observed phenotype as the sample size decreases. Lastly, a genome-wide association study is performed as an application of a downstream analysis, where NetSHy summarization scores on the biological networks identify more significant single nucleotide polymorphisms than the conventional network representation.
Availability and implementation
R code implementation of NetSHy is available at https://github.com/thaovu1/NetSHy
Supplementary information
Supplementary data are available at Bioinformatics online.
Bacterial viruses (bacteriophages) have a key role in shaping the development and functional outputs of host microbiomes. Although metagenomic approaches have greatly expanded our understanding of ...the prokaryotic virosphere, additional tools are required for the phage-oriented dissection of metagenomic data sets, and host-range affiliation of recovered sequences. Here we demonstrate the application of a genome signature-based approach to interrogate conventional whole-community metagenomes and access subliminal, phylogenetically targeted, phage sequences present within. We describe a portion of the biological dark matter extant in the human gut virome, and bring to light a population of potentially gut-specific Bacteroidales-like phage, poorly represented in existing virus like particle-derived viral metagenomes. These predominantly temperate phage were shown to encode functions of direct relevance to human health in the form of antibiotic resistance genes, and provided evidence for the existence of putative 'viral-enterotypes' among this fraction of the human gut virome.
Background
Electronic cigarettes (e-cigarettes) are battery-operated nicotine-delivery devices used by some smokers as a cessation tool as well as by never smokers.
Objective
To determine the usage ...of e-cigarettes in older adults at risk for or with chronic obstructive pulmonary disease (COPD).
Design
Prospective cohorts.
Participants
COPDGene (
N
= 3536) and SPIROMICS (
N
= 1060) subjects who were current or former smokers aged 45–80.
Main Measures
Participants were surveyed to determine whether e-cigarette use was associated with longitudinal changes in COPD progression or smoking habits.
Key Results
From 2010 to 2016, participants who had ever used e-cigarettes steadily increased to 12–16%, but from 2014 to 2016 current use was stable at ~5%. E-cigarette use in African-Americans (AA) and whites was similar; however, AA were 1.8–2.9 times as likely to use menthol-flavored e-cigarettes. Current e-cigarette and conventional cigarette users had higher nicotine dependence and consumed more nicotine than those who smoked only conventional cigarettes. E-cigarette users had a heavier conventional cigarette smoking history and worse respiratory health, were less likely to reduce or quit conventional cigarette smoking, had higher nicotine dependence, and were more likely to report chronic bronchitis and exacerbations. Ever e-cigarette users had more rapid decline in lung function, but this trend did not persist after adjustment for persistent conventional cigarette smoking.
Conclusions
E-cigarette use, which is common in adults with or at risk for COPD, was associated with worse pulmonary-related health outcomes, but not with cessation of smoking conventional cigarettes. Although this was an observational study, we find no evidence supporting the use of e-cigarettes as a harm reduction strategy among current smokers with or at risk for COPD.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Novel proteomics platforms, such as the aptamer‐based SOMAscan platform, can quantify large numbers of proteins efficiently and cost‐effectively and are rapidly growing in popularity. However, ...comparisons to conventional immunoassays remain underexplored, leaving investigators unsure when cross‐assay comparisons are appropriate. The correlation of results from immunoassays with relative protein quantification is explored by SOMAscan. For 63 proteins assessed in two chronic obstructive pulmonary disease (COPD) cohorts, subpopulations and intermediate outcome measures in COPD Study (SPIROMICS), and COPDGene, using myriad rules based medicine multiplex immunoassays and SOMAscan, Spearman correlation coefficients range from −0.13 to 0.97, with a median correlation coefficient of ≈0.5 and consistent results across cohorts. A similar range is observed for immunoassays in the population‐based Multi‐Ethnic Study of Atherosclerosis and for other assays in COPDGene and SPIROMICS. Comparisons of relative quantification from the antibody‐based Olink platform and SOMAscan in a small cohort of myocardial infarction patients also show a wide correlation range. Finally, cis pQTL data, mass spectrometry aptamer confirmation, and other publicly available data are integrated to assess relationships with observed correlations. Correlation between proteomics assays shows a wide range and should be carefully considered when comparing and meta‐analyzing proteomics data across assays and studies.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
Emerging technologies are increasingly employed in environmental citizen science projects. This integration offers benefits and opportunities for scientists and participants alike. Citizen science ...can support large-scale, long-term monitoring of species occurrences, behaviour and interactions. At the same time, technologies can foster participant engagement, regardless of pre-existing taxonomic expertise or experience, and permit new types of data to be collected. Yet, technologies may also create challenges by potentially increasing financial costs, necessitating technological expertise or demanding training of participants. Technology could also reduce people's direct involvement and engagement with nature. In this perspective, we discuss how current technologies have spurred an increase in citizen science projects and how the implementation of emerging technologies in citizen science may enhance scientific impact and public engagement. We show how technology can act as (i) a facilitator of current citizen science and monitoring efforts, (ii) an enabler of new research opportunities, and (iii) a transformer of science, policy and public participation, but could also become (iv) an inhibitor of participation, equity and scientific rigour. Technology is developing fast and promises to provide many exciting opportunities for citizen science and insect monitoring, but while we seize these opportunities, we must remain vigilant against potential risks. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.
There is notable heterogeneity in the clinical presentation of patients with COPD. To characterise this heterogeneity, we sought to identify subgroups of smokers by applying cluster analysis to data ...from the COPDGene study.
We applied a clustering method, k-means, to data from 10 192 smokers in the COPDGene study. After splitting the sample into a training and validation set, we evaluated three sets of input features across a range of k (user-specified number of clusters). Stable solutions were tested for association with four COPD-related measures and five genetic variants previously associated with COPD at genome-wide significance. The results were confirmed in the validation set.
We identified four clusters that can be characterised as (1) relatively resistant smokers (ie, no/mild obstruction and minimal emphysema despite heavy smoking), (2) mild upper zone emphysema-predominant, (3) airway disease-predominant and (4) severe emphysema. All clusters are strongly associated with COPD-related clinical characteristics, including exacerbations and dyspnoea (p<0.001). We found strong genetic associations between the mild upper zone emphysema group and rs1980057 near HHIP, and between the severe emphysema group and rs8034191 in the chromosome 15q region (p<0.001). All significant associations were replicated at p<0.05 in the validation sample (12/12 associations with clinical measures and 2/2 genetic associations).
Cluster analysis identifies four subgroups of smokers that show robust associations with clinical characteristics of COPD and known COPD-associated genetic variants.
Chronic tobacco smoke exposure results in a broad range of lung pathologies including emphysema, airway disease and parenchymal fibrosis as well as a multitude of extra-pulmonary comorbidities. Prior ...work using CT imaging has identified several clinically relevant subgroups of smoking related lung disease, but these investigations have generally lacked organ specific molecular correlates.
Can CT imaging be used to identify clinical phenotypes of smoking related lung disease that have specific bronchial epithelial gene expression patterns to better understand disease pathogenesis?
Using K-means clustering, we clustered participants from the COPDGene study (n = 5,273) based on CT imaging characteristics and then evaluated their clinical phenotypes. These clusters were replicated in the Detection of Early Lung Cancer Among Military Personnel (DECAMP) cohort (n = 360), and were further characterized using bronchial epithelial gene expression.
Three clusters (preserved, interstitial predominant and emphysema predominant) were identified. Compared to the preserved cluster, the interstitial and emphysema clusters had worse lung function, exercise capacity and quality of life. In longitudinal follow-up, individuals from the emphysema group had greater declines in exercise capacity and lung function, more emphysema, more exacerbations, and higher mortality. Similarly, genes involved in inflammatory pathways (tumor necrosis factor-α, interferon-β) are more highly expressed in bronchial epithelial cells from individuals in the emphysema cluster, while genes associated with T-cell related biology are decreased in these samples. Samples from individuals in the interstitial cluster generally had intermediate levels of expression of these genes.
Using quantitative CT imaging, we identified three groups of individuals in older ever-smokers that replicate in two cohorts. Airway gene expression differences between the three groups suggests increased levels of inflammation in the most severe clinical phenotype, possibly mediated by the tumor necrosis factor-α and interferon-β pathways.
COPDGene (NCT00608764), DECAMP-1 (NCT01785342), DECAMP-2 (NCT02504697)
Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, ...and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD SPIROMICS (N = 750); COPDGene (N = 590) was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p < 8 X 10-10) pQTLs in 38 (43%) of blood proteins tested. Most pQTL SNPs were novel with low overlap to eQTL SNPs. The pQTL SNPs explained >10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10-392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
IMPORTANCE: Airflow obstruction on spirometry is universally used to define chronic obstructive pulmonary disease (COPD), and current or former smokers without airflow obstruction may assume that ...they are disease free. OBJECTIVE: To identify clinical and radiologic evidence of smoking-related disease in a cohort of current and former smokers who did not meet spirometric criteria for COPD, for whom we adopted the discarded label of Global Initiative for Obstructive Lung Disease (GOLD) 0. DESIGN, SETTING, AND PARTICIPANTS: Individuals from the Genetic Epidemiology of COPD (COPDGene) cross-sectional observational study completed spirometry, chest computed tomography (CT) scans, a 6-minute walk, and questionnaires. Participants were recruited from local communities at 21 sites across the United States. The GOLD 0 group (n = 4388) (ratio of forced expiratory volume in the first second of expiration FEV1 to forced vital capacity >0.7 and FEV1 ≥80% predicted) from the COPDGene study was compared with a GOLD 1 group (n = 794), COPD groups (n = 3690), and a group of never smokers (n = 108). Recruitment began in January 2008 and ended in July 2011. MAIN OUTCOMES AND MEASURES: Physical function impairments, respiratory symptoms, CT abnormalities, use of respiratory medications, and reduced respiratory-specific quality of life. RESULTS: One or more respiratory-related impairments were found in 54.1% (2375 of 4388) of the GOLD 0 group. The GOLD 0 group had worse quality of life (mean SD St George’s Respiratory Questionnaire total score, 17.0 18.0 vs 3.8 6.8 for the never smokers; P < .001) and a lower 6-minute walk distance, and 42.3% (127 of 300) of the GOLD 0 group had CT evidence of emphysema or airway thickening. The FEV1 percent predicted distribution and mean for the GOLD 0 group were lower but still within the normal range for the population. Current smoking was associated with more respiratory symptoms, but former smokers had greater emphysema and gas trapping. Advancing age was associated with smoking cessation and with more CT findings of disease. Individuals with respiratory impairments were more likely to use respiratory medications, and the use of these medications was associated with worse disease. CONCLUSIONS AND RELEVANCE: Lung disease and impairments were common in smokers without spirometric COPD. Based on these results, we project that there are 35 million current and former smokers older than 55 years in the United States who may have unrecognized disease or impairment. The effect of chronic smoking on the lungs and the individual is substantially underestimated when using spirometry alone.