Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high ...positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.
Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.
Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).
Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.
Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
Because polygenic risk scores (PRSs) for coronary heart disease (CHD) are derived from mainly European ancestry (EA) cohorts, their validity in African ancestry (AA) and Hispanic ethnicity (HE) ...individuals is unclear. We investigated associations of “restricted” and genome-wide PRSs with CHD in three major racial and ethnic groups in the U.S. The eMERGE cohort (mean age 48 ± 14 years, 58% female) included 45,645 EA, 7,597 AA, and 2,493 HE individuals. We assessed two restricted PRSs (PRSTikkanen and PRSTada; 28 and 50 variants, respectively) and two genome-wide PRSs (PRSmetaGRS and PRSLDPred; 1.7 M and 6.6 M variants, respectively) derived from EA cohorts. Over a median follow-up of 11.1 years, 2,652 incident CHD events occurred. Hazard and odds ratios for the association of PRSs with CHD were similar in EA and HE cohorts but lower in AA cohorts. Genome-wide PRSs were more strongly associated with CHD than restricted PRSs were. PRSmetaGRS, the best performing PRS, was associated with CHD in all three cohorts; hazard ratios (95% CI) per 1 SD increase were 1.53 (1.46–1.60), 1.53 (1.23–1.90), and 1.27 (1.13–1.43) for incident CHD in EA, HE, and AA individuals, respectively. The hazard ratios were comparable in the EA and HE cohorts (pinteraction = 0.77) but were significantly attenuated in AA individuals (pinteraction= 2.9 × 10−3). These results highlight the potential clinical utility of PRSs for CHD as well as the need to assemble diverse cohorts to generate ancestry- and ethnicity PRSs.
Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver illness with a genetically heterogeneous background that can be accompanied by considerable morbidity and attendant health care ...costs. The pathogenesis and progression of NAFLD is complex with many unanswered questions. We conducted genome-wide association studies (GWASs) using both adult and pediatric participants from the Electronic Medical Records and Genomics (eMERGE) Network to identify novel genetic contributors to this condition.
First, a natural language processing (NLP) algorithm was developed, tested, and deployed at each site to identify 1106 NAFLD cases and 8571 controls and histological data from liver tissue in 235 available participants. These include 1242 pediatric participants (396 cases, 846 controls). The algorithm included billing codes, text queries, laboratory values, and medication records. Next, GWASs were performed on NAFLD cases and controls and case-only analyses using histologic scores and liver function tests adjusting for age, sex, site, ancestry, PC, and body mass index (BMI).
Consistent with previous results, a robust association was detected for the PNPLA3 gene cluster in participants with European ancestry. At the PNPLA3-SAMM50 region, three SNPs, rs738409, rs738408, and rs3747207, showed strongest association (best SNP rs738409 p = 1.70 × 10
). This effect was consistent in both pediatric (p = 9.92 × 10
) and adult (p = 9.73 × 10
) cohorts. Additionally, this variant was also associated with disease severity and NAFLD Activity Score (NAS) (p = 3.94 × 10
, beta = 0.85). PheWAS analysis link this locus to a spectrum of liver diseases beyond NAFLD with a novel negative correlation with gout (p = 1.09 × 10
). We also identified novel loci for NAFLD disease severity, including one novel locus for NAS score near IL17RA (rs5748926, p = 3.80 × 10
), and another near ZFP90-CDH1 for fibrosis (rs698718, p = 2.74 × 10
). Post-GWAS and gene-based analyses identified more than 300 genes that were used for functional and pathway enrichment analyses.
In summary, this study demonstrates clear confirmation of a previously described NAFLD risk locus and several novel associations. Further collaborative studies including an ethnically diverse population with well-characterized liver histologic features of NAFLD are needed to further validate the novel findings.
Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats.
To present lessons learned about ...validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies.
The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University.
By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results.
Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.
Clinical data in electronic medical records (EMRs) are a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network (eMERGE) investigates whether ...data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome-wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73 to 98% and negative predictive values of 98 to 100%. Most EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
Despite significant advances in knowledge of the genetic architecture of asthma, specific contributors to the variability in the burden between populations remain uncovered.
To identify additional ...genetic susceptibility factors of asthma in European American and African American populations.
A phenotyping algorithm mining electronic medical records was developed and validated to recruit cases with asthma and control subjects from the Electronic Medical Records and Genomics network. Genome-wide association analyses were performed in pediatric and adult asthma cases and control subjects with European American and African American ancestry followed by metaanalysis. Nominally significant results were reanalyzed conditioning on allergy status.
The validation of the algorithm yielded an average of 95.8% positive predictive values for both cases and control subjects. The algorithm accrued 21,644 subjects (65.83% European American and 34.17% African American). We identified four novel population-specific associations with asthma after metaanalyses: loci 6p21.31, 9p21.2, and 10q21.3 in the European American population, and the PTGES gene in African Americans. TEK at 9p21.2, which encodes TIE2, has been shown to be involved in remodeling the airway wall in asthma, and the association remained significant after conditioning by allergy. PTGES, which encodes the prostaglandin E synthase, has also been linked to asthma, where deficient prostaglandin E
synthesis has been associated with airway remodeling.
This study adds to understanding of the genetic architecture of asthma in European Americans and African Americans and reinforces the need to study populations of diverse ethnic backgrounds to identify shared and unique genetic predictors of asthma.
Display omitted
•PdxAuy/C catalysts were synthesized and tested for the sorbitol electro-oxidation.•The catalysts activity was attributed to strong bimetallic interactions.•Pd60Au40/C gave up to ...209 mA mg−1 at 0.1 V, almost 10-fold higher than Au/C.•Pd60Au40/C and Pd40Au60/C maintained 100 % stability after 1000 cycles in 3 M KOH.•The SOR gives value-added by-products like glycerol and glyceraldehyde.
In the present work, the sorbitol electro-oxidation reaction (SOR) was studied in alkaline medium employing bimetallic catalysts of palladium and gold (PdxAuy/C). The activity for SOR was mainly correlated to bimetallic interactions, because the catalysts presented similar characteristics like morphology, average particle sizes (∼10 nm), and metallic mass contents (∼20 wt%). The bimetallic interactions induced changes like shifts in the lattice parameters, as well in Pd binding energies (up to −0.25 eV) and into the reduction peak of Pd oxides during the electrochemical characterization. Pd40Au60/C and Pd60Au40/C presented the best activities at 3 M KOH achieving onset potentials of −0.43 V vs. NHE and current densities of 128 and 209 mA mg−1 at a fixed potential of 0.1 V, respectively. Pd60Au40/C presented almost 3.5-fold more current density than Pd/C, and 10-fold more than Au/C at the same conditions. These materials also displayed superior stability, maintaining their maximum current densities after 1000 cycles in 0.5 M sorbitol + 3 M KOH. The SOR by-products were determined by HPLC and GC/MS, revealing that these catalysts break sorbitol C–C bonds by attacking intermediate carbons (C3 and C4) and/or by subsequent removal of chain-end carbons. Among the detected by-products, value-added molecules like glycerol, ethylene glycol, formic acid, and γ-butyrolactone were found. In summary, the stability and activity of Pd-based materials make them suitable for the oxidation of sorbitol as a promising alcohol for fuel cell technology.
Ambient air pollution is produced by sources including vehicular traffic, coal-fired power plants, hydraulic fracturing, agricultural production, and forest fires. It consists of primary pollutants ...generated by combustion and secondary pollutants formed in the atmosphere from precursor gases. Air pollution causes and exacerbates climate change, and climate change worsens health effects of air pollution. Infants and children are uniquely sensitive to air pollution, because their organs are developing and they have higher air per body weight intake. Health effects linked to air pollution include not only exacerbations of respiratory diseases but also reduced lung function development and increased asthma incidence. Additional outcomes of concern include preterm birth, low birth weight, neurodevelopmental disorders, IQ loss, pediatric cancers, and increased risks for adult chronic diseases. These effects are mediated by oxidative stress, chronic inflammation, endocrine disruption, and genetic and epigenetic mechanisms across the life span. Natural experiments demonstrate that with initiatives such as increased use of public transportation, both air quality and community health improve. Similarly, the Clean Air Act has improved air quality, although exposure inequities persist. Other effective strategies for reducing air pollution include ending reliance on coal, oil, and gas; regulating industrial emissions; reducing exposure with attention to proximity of residences, schools, and child care facilities to traffic; and a greater awareness of the Air Quality Index. This policy reviews both short- and long-term health consequences of ambient air pollution, especially in relation to developmental exposures. It examines individual, community, and legislative strategies to mitigate air pollution.
Mosquito-borne viruses are a growing global threat. Initial viral inoculation occurs in the skin via the mosquito 'bite', eliciting immune responses that shape the establishment of infection and ...pathogenesis. Here we assess the cutaneous innate and adaptive immune responses to controlled Aedes aegypti feedings in humans living in Aedes-endemic areas. In this single-arm, cross-sectional interventional study (trial registration #NCT04350905), we enroll 30 healthy adult participants aged 18 to 45 years of age from Cambodia between October 2020 and January 2021. We perform 3-mm skin biopsies at baseline as well as 30 min, 4 h, and 48 h after a controlled feeding by uninfected Aedes aegypti mosquitos. The primary endpoints are measurement of changes in early and late innate responses in bitten vs unbitten skin by gene expression profiling, immunophenotyping, and cytokine profiling. The results reveal induction of neutrophil degranulation and recruitment of skin-resident dendritic cells and M2 macrophages. As the immune reaction progresses T cell priming and regulatory pathways are upregulated along with a shift to T
2-driven responses and CD8
T cell activation. Stimulation of participants' bitten skin cells with Aedes aegypti salivary gland extract results in reduced pro-inflammatory cytokine production. These results identify key immune genes, cell types, and pathways in the human response to mosquito bites and can be leveraged to inform and develop novel therapeutics and vector-targeted vaccine candidates to interfere with vector-mediated disease.
Blood lead concentrations have decreased dramatically in US children over the past 4 decades, but too many children still live in housing with deteriorated lead-based paint and are at risk for lead ...exposure with resulting lead-associated cognitive impairment and behavioral problems. Evidence continues to accrue that commonly encountered blood lead concentrations, even those below 5 µg/dL (50 ppb), impair cognition; there is no identified threshold or safe level of lead in blood. From 2007 to 2010, approximately 2.6% of preschool children in the United States had a blood lead concentration ≥5 µg/dL (≥50 ppb), which represents about 535 000 US children 1 to 5 years of age. Evidence-based guidance is available for managing increased lead exposure in children, and reducing sources of lead in the environment, including lead in housing, soil, water, and consumer products, has been shown to be cost-beneficial. Primary prevention should be the focus of policy on childhood lead toxicity.