Widespread sharing of data from electronic health records and patient-reported outcomes can strengthen the national capacity for conducting cost-effective clinical trials and allow research to be ...embedded within routine care delivery. While pragmatic clinical trials (PCTs) have been performed for decades, they now can draw on rich sources of clinical and operational data that are continuously fed back to inform research and practice. The Health Care Systems Collaboratory program, initiated by the NIH Common Fund in 2012, engages healthcare systems as partners in discussing and promoting activities, tools, and strategies for supporting active participation in PCTs. The NIH Collaboratory consists of seven demonstration projects, and seven problem-specific working group 'Cores', aimed at leveraging the data captured in heterogeneous 'real-world' environments for research, thereby improving the efficiency, relevance, and generalizability of trials. Here, we introduce the Collaboratory, focusing on its Phenotype, Data Standards, and Data Quality Core, and present early observations from researchers implementing PCTs within large healthcare systems. We also identify gaps in knowledge and present an informatics research agenda that includes identifying methods for the definition and appropriate application of phenotypes in diverse healthcare settings, and methods for validating both the definition and execution of electronic health records based phenotypes.
The Kaiser Permanente Research Bank (KPRB) is collecting biospecimens and surveys linked to electronic health records (EHR) from approximately 400,000 adult KP members. Within the KPRB, we developed ...a Cancer Cohort to address issues related to cancer survival, and to understand how genetic, lifestyle and environmental factors impact cancer treatment, treatment sequelae, and prognosis. We describe the Cancer Cohort design and implementation, describe cohort characteristics after 5 years of enrollment, and discuss future directions.
Cancer cases are identified using rapid case ascertainment algorithms, linkage to regional or central tumor registries, and direct outreach to KP members with a history of cancer. Enrollment is primarily through email invitation. Participants complete a consent form, survey, and donate a blood or saliva sample. All cancer types are included.
As of December 31, 2020, the cohort included 65,225 cases (56% female, 44% male) verified in tumor registries. The largest group was diagnosed between 60 and 69 years of age (31%) and are non-Hispanic White (83%); however, 10,076 (16%) were diagnosed at ages 18-49 years, 4208 (7%) are Hispanic, 3393 (5%) are Asian, and 2389 (4%) are Black. The median survival time is 14 years. Biospecimens are available on 98% of the cohort.
The KPRB Cancer Cohort is designed to improve our understanding of treatment efficacy and factors that contribute to long-term cancer survival. The cohort's diversity - with respect to age, race/ethnicity and geographic location - will facilitate research on factors that contribute to cancer survival disparities.
To compare rule-based data quality (DQ) assessment approaches across multiple national clinical data sharing organizations.
Six organizations with established data quality assessment (DQA) programs ...provided documentation or source code describing current DQ checks. DQ checks were mapped to the categories within the data verification context of the harmonized DQA terminology. To ensure all DQ checks were consistently mapped, conventions were developed and four iterations of mapping performed. Difficult-to-map DQ checks were discussed with research team members until consensus was achieved.
Participating organizations provided 11,026 DQ checks, of which 99.97 percent were successfully mapped to a DQA category. Of the mapped DQ checks (N=11,023), 214 (1.94 percent) mapped to multiple DQA categories. The majority of DQ checks mapped to Atemporal Plausibility (49.60 percent), Value Conformance (17.84 percent), and Atemporal Completeness (12.98 percent) categories.
Using the common DQA terminology, near-complete (99.97 percent) coverage across a wide range of DQA programs and specifications was reached. Comparing the distributions of mapped DQ checks revealed important differences between participating organizations. This variation may be related to the organization's stakeholder requirements, primary analytical focus, or maturity of their DQA program. Not within scope, mapping checks within the data validation context of the terminology may provide additional insights into DQA practice differences.
A common DQA terminology provides a means to help organizations and researchers understand the coverage of their current DQA efforts as well as highlight potential areas for additional DQA development. Sharing DQ checks between organizations could help expand the scope of DQA across clinical data networks.
Objective: Harmonized data quality (DQ) assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) data ...for operational analytics, quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR data is fit for specific uses.
Materials and Methods: DQ publications, informatics and analytics experts, managers of established DQ programs, and operational manuals from several mature EHR-based research networks were reviewed to identify potential DQ terms and categories. Two face-to-face stakeholder meetings were used to vet an initial set of DQ terms and definitions that were grouped into an overall conceptual framework. Feedback received from data producers and users was used to construct a draft set of harmonized DQ terms and categories. Multiple rounds of iterative refinement resulted in a set of terms and organizing framework consisting of DQ categories, subcategories, terms, definitions, and examples. The harmonized terminology and logical frameworks inclusiveness was evaluated against ten published DQ terminologies.
Results: Existing DQ terms were harmonized and organized into a framework by defining three DQ categories: (1) Conformance (2) Completeness and (3) Plausibility and two DQ assessment contexts: (1) Verification and (2) Validation. Conformance and Plausibility categories were further divided into subcategories. Each category and subcategory was defined with respect to whether the data may be verified with organizational data, or validated against an accepted gold standard, depending on proposed context and uses. The coverage of the harmonized DQ terminology was validated by successfully aligning to multiple published DQ terminologies.
Discussion: Existing DQ concepts, community input, and expert review informed the development of a distinct set of terms, organized into categories and subcategories. The resulting DQ terms successfully encompassed a wide range of disparate DQ terminologies. Operational definitions were developed to provide guidance for implementing DQ assessment procedures. The resulting structure is an inclusive DQ framework for standardizing DQ assessment and reporting. While our analysis focused on the DQ issues often found in EHR data, the new terminology may be applicable to a wide range of electronic health data such as administrative, research, and patient-reported data.
Conclusion: A consistent, common DQ terminology, organized into a logical framework, is an initial step in enabling data owners and users, patients, and policy makers to evaluate and communicate data quality findings in a well-defined manner with a shared vocabulary. Future work will leverage the framework and terminology to develop reusable data quality assessment and reporting methods.
Objectives: The Weight Loss Maintenance Trial (WLM) was a multicenter, randomized trial comparing two weight loss maintenance interventions, a personal contact (PC) program with primarily ...telephone-based monthly contacts, and an Internet-based program (interactive technology, IT), to a self-directed control group, among overweight or obese individuals at high cardiovascular risk. This study describes implementation costs of both interventions as well as IT development costs. Methods: Resources were micro-costed in 2006 dollars from the primary perspective of a sponsoring healthcare system considering adopting an extant intervention, rather than developing its own. Costs were discounted at 3 percent annually. Length of trial participation was 30 months (randomization during February–November 2004). IT development costs were assessed over 36 months. Univariate and multivariate, including probabilistic, sensitivity analyses were performed. Results: Total discounted IT development costs over 36 months were $839,949 ($2,414 per IT participant). Discounted 30-month implementation costs for 342 PC participants were $537,242 ($1,571 per participant), and for 348 IT participants, were $214,879 ($617 per participant). Under all plausible scenarios, PC implementation costs exceeded IT implementation costs. Conclusions: Costs of implementing and operating an Internet-based intervention for weight loss maintenance were substantially less than analogous costs of an intervention using standard phone and in-person contacts and are of a magnitude that would be attractive to many health systems, subject to demonstration of cost-effectiveness.