With the increasing availability of data in the public domain, there has been a growing interest in exploiting information from external sources to improve the analysis of smaller scale studies. An ...emerging challenge in the era of big data is that the subject‐level data are high dimensional, but the external information is at an aggregate level and of a lower dimension. Moreover, heterogeneity and uncertainty in the auxiliary information are often not accounted for in information synthesis. In this paper, we propose a unified framework to summarize various forms of aggregated information via estimating equations and develop a penalized empirical likelihood approach to incorporate such information in logistic regression. When the homogeneity assumption is violated, we extend the method to account for population heterogeneity among different sources of information. When the uncertainty in the external information is not negligible, we propose a variance estimator adjusting for the uncertainty. The proposed estimators are asymptotically more efficient than the conventional penalized maximum likelihood estimator and enjoy the oracle property even with a diverging number of predictors. Simulation studies show that the proposed approaches yield higher accuracy in variable selection compared with competitors. We illustrate the proposed methodologies with a pediatric kidney transplant study.
Interest has grown in synthesizing participant level data of a study with relevant external aggregate information. Several efficient and flexible procedures have been developed under the assumption ...that the internal study and the external sources concern the same population. This homogeneity condition, albeit commonly being imposed, is hard to check due to limitedly available external information in aggregate data forms. Bias may be introduced when the assumption is violated. In this article, we propose a penalized likelihood approach that avoids undesirable bias by simultaneously selecting and synthesizing consistent external aggregate information. The proposed approach provides a general framework which incorporate consistent external information from heterogeneous study populations as long as the conditional distribution of the dependent variable under investigation is same and differences in the independent variable distributions are properly accounted for via a semi‐parametric density ratio model. The proposed approach also properly accounts for the sampling errors in the external information. A two‐step estimator and an optimization algorithm are proposed for computation. We establish the selection and estimation consistency and the asymptotic normality of the two‐step estimator. The proposed approach is illustrated with an analysis of gestational weight gain management studies.
Synthesizing external aggregated information has been proven useful in improving estimation efficiency when conducting statistical analysis using a limited amount of data. In this paper, we develop a ...unified framework for combining information from high‐dimensional individual‐level data and potentially low‐dimensional external aggregate data under the Cox model. We summarize various forms of external aggregated information by population estimating equations and propose a penalized empirical likelihood approach to borrow information from these estimating equations. The proposed methods possess the flexibility to handle the case where individual‐level data and external aggregate data are from heterogeneous populations. Specifically, a penalized empirical likelihood ratio test is developed to check for the potential heterogeneity, and a semiparametric density ratio model is postulated to account for the heterogeneity. Moreover, we study the impact of uncertainty in the auxiliary information on the efficiency gain and propose a modified variance estimator to adjust for the uncertainty. The proposed estimators enjoy the oracle property and are asymptotically more efficient than the penalized partial likelihood estimator that does not exploit the external aggregated information. Simulation studies show improvement in both estimation efficiency and variable selection over the competitors. The proposed approaches are applied to the analysis of a pediatric kidney transplant study for illustration.
Leaf morphology is one of the most important agronomic traits in rice breeding because of its contribution to crop yield. The drooping leaf (dr) mutant was developed from the Ilpum rice cultivar by ...ethyl methanesulfonate (EMS) mutagenesis. Compared with the wild type, dr plants exhibited drooping leaves accompanied by a small midrib, short panicle, and reduced plant height. The phenotype of the dr plant was caused by a mutation within a single recessive gene on chromosome 2, dr (LOC_Os02g15230), which encodes a GDSL esterase. Analysis of wild-type and dr sequences revealed that the dr allele carried a single nucleotide substitution, glycine to aspartic acid. RNAi targeted to LOC_Os02g15230 produced same phenotypes to the dr mutation, confirming LOC_Os02g15230 as the dr gene. Microscopic observations and plant nutrient analysis of SiO2 revealed that silica was less abundant in dr leaves than in wild-type leaves. This study suggests that the dr gene is involved in the regulation of silica deposition and that disruption of silica processes lead to drooping leaf phenotypes.
As individuals may respond differently to treatment, estimating subgroup effects is important to understand the characteristics of individuals who may benefit. Factors that define subgroups may be ...correlated, complicating evaluation of subgroup effects, especially in observational studies requiring control of confounding variables. We address this problem when propensity score methods are used for confounding control. A common practice is to evaluate candidate subgroup identifiers one at a time without adjusting for other candidate identifiers. We show that this practice can be misleading if the treatment effect modification attributed to a candidate identifier is in truth due to the effect of other correlated true effect modifiers. Whereas jointly analyzing multiple identifiers provides estimates of the desired subgroup effects adjusted for the effects of the other identifiers, it requires the propensity scores to adequately reflect the underlying treatment selection processes and balance the covariates within each subgroup of interest. Satisfying the requirement in practice is hard since the number of strata may increase quickly, while the per stratum sample size may decrease dramatically. A practically helpful approach is utilizing the whole cohort for the propensity score estimation with modeling of interaction terms to reflect the potentially different treatment selection processes across strata. We empirically examine the performance of the whole cohort approach by itself and with subjecting the interaction terms to variable selection. Our results using both simulations and real data analysis suggest that the whole cohort approach should explore inclusion of high-order interactions in the propensity score model to ensure adequate covariate balance across strata, and that variable selection is of limited utility.
We report the identification of novel highly pathogenic avian influenza viruses of subtype H5N6, clade 2.3.4.4, that presumably originated from China. In addition, reassortant strains with Eurasian ...lineage low pathogenic avian influenza viruses were isolated in wild birds and poultry in South Korea. The emergence of these novel H5N6 viruses and their circulation among bird populations are of great concern because of the potential for virus dissemination with intercontinental wild bird migration.
•Novel highly pathogenic avian influenza viruses of subtype H5N6 were detected in South Korea in 2016.•The Korean H5N6 viruses seem to be likely originated from H5N6 viruses circulating in Guangdong province of China.•Their reassortants with Eurasian lineage low pathogenic avian influenza viruses were isolated in wild birds and poultry.
In this multi-center, assessor-blinded pilot study, the diagnostic efficacy of cCeLL-Ex vivo, a second-generation confocal laser endomicroscopy (CLE), was compared against the gold standard frozen ...section analysis for intraoperative brain tumor diagnosis. The study was conducted across three tertiary medical institutions in the Republic of Korea. Biopsy samples from newly diagnosed brain tumor patients were categorized based on location and divided for permanent section analysis, frozen section analysis, and cCeLL-Ex vivo imaging. Of the 74 samples from 55 patients, the majority were from the tumor core (74.3%). cCeLL-Ex vivo exhibited a relatively higher diagnostic accuracy (89.2%) than frozen section analysis (86.5%), with both methods showing a sensitivity of 92.2%. cCeLL-Ex vivo also demonstrated higher specificity (70% vs. 50%), positive predictive value (PPV) (95.2% vs. 92.2%), and negative predictive value (NPV) (58.3% vs. 50%). Furthermore, the time from sample preparation to diagnosis was notably shorter with cCeLL-Ex vivo (13 min 17 s) compared to frozen section analysis (28 min 28 s) (p-value < 0.005). These findings underscore cCeLL-Ex vivo's potential as a supplementary tool for intraoperative brain tumor diagnosis, with future studies anticipated to further validate its clinical utility.