In many applications, covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. One important example ...arises in genetic association studies, where genes may have several variants capable of contributing to disease. An ideal penalized regression approach would select variables by balancing both the direct evidence of a feature's importance as well as the indirect evidence offered by the grouping structure. This work proposes a new approach we call the group exponential lasso (GEL) which features a decay parameter controlling the degree to which feature selection is coupled together within groups. We demonstrate that the GEL has a number of statistical and computational advantages over previously proposed group penalties such as the group lasso, group bridge, and composite MCP. Finally, we apply these methods to the problem of detecting rare variants in a genetic association study.
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the ...group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.
SUMMARY
Penalized regression methods are an attractive tool for high-dimensional data analysis, but their widespread adoption has been hampered by the difficulty of applying inferential tools. In ...particular, the question "How reliable is the selection of those features?" has proved difficult to address. In part, this difficulty arises from defining false discoveries in the classical, fully conditional sense, which is possible in low dimensions but does not scale well to high-dimensional settings. Here, we consider the analysis of marginal false discovery rates (mFDRs) for penalized regression methods. Restricting attention to the mFDR permits straightforward estimation of the number of selections that would likely have occurred by chance alone, and therefore provides a useful summary of selection reliability. Theoretical analysis and simulation studies demonstrate that this approach is quite accurate when the correlation among predictors is mild, and only slightly conservative when the correlation is stronger. Finally, the practical utility of the proposed method and its considerable advantages over other approaches are illustrated using gene expression data from The Cancer Genome Atlas and genome-wide association study data from the Myocardial Applied Genomics Network.
A growing amount of evidence indicates in utero and early life growth has profound, long-term consequences for an individual's health throughout the life course; however, there is limited data in ...preterm infants, a vulnerable population at risk for growth abnormalities.
To address the gap in knowledge concerning early growth and its determinants in preterm infants.
A retrospective cohort study was performed using a population of preterm (< 37 weeks gestation) infants obtained from an electronic medical record database. Weight z-scores were acquired from discharge until roughly two years corrected age. Linear mixed effects modeling, with random slopes and intercepts, was employed to estimate growth trajectories.
Thirteen variables, including maternal race, hypertension during pregnancy, preeclampsia, first trimester body mass index, multiple status, gestational age, birth weight, birth length, head circumference, year of birth, length of birth hospitalization stay, total parenteral nutrition, and dextrose treatment, were significantly associated with growth rates of preterm infants in univariate analyses. A small percentage (1.32% - 2.07%) of the variation in the growth of preterm infants can be explained in a joint model of these perinatal factors. In extremely preterm infants, additional variation in growth trajectories can be explained by conditions whose risk differs by degree of prematurity. Specifically, infants with periventricular leukomalacia or retinopathy of prematurity experienced decelerated rates of growth compared to infants without such conditions.
Factors found to influence growth over time in children born at term also affect growth of preterm infants. The strength of association and the magnitude of the effect varied by gestational age, revealing that significant heterogeneity in growth and its determinants exists within the preterm population.
Maternal lipid profiles during pregnancy are associated with risk for preterm birth. This study investigates the association between maternal dyslipidemia and subsequent preterm birth among pregnant ...women in the state of California. Births were identified from California birth certificate and hospital discharge records from 2007-2012 (N = 2,865,987). Preterm birth was defined as <37 weeks completed gestation and dyslipidemia was defined by diagnostic codes. Subtypes of preterm birth were classified as preterm premature rupture of membranes (PPROM), spontaneous labor, and medically indicated, according to birth certificate data and diagnostic codes. The association between dyslipidemia and preterm birth was tested with logistic regression. Models were adjusted for maternal age at delivery, race/ethnicity, hypertension, pre-pregnancy body mass index, insurance type, and education. Maternal dyslipidemia was significantly associated with increased odds of preterm birth (adjusted OR: 1.49, 95%CI: 1.39, 1.59). This finding was consistent across all subtypes of preterm birth, including PPROM (adjusted OR: 1.54, 95%CI: 1.34, 1.76), spontaneous (adjusted OR: 1.51, 95%CI: 1.39, 1.65), and medically indicated (adjusted OR: 1.454, 95%CI: 1.282, 1.649). This study suggests that maternal dyslipidemia is associated with increased risk for all types of preterm birth.
Implementation of dietary and lifestyle interventions prior to and early in pregnancy in high risk women has been shown to reduce the risk of gestational diabetes mellitus (GDM) development later in ...pregnancy. Although numerous risk factors for GDM have been identified, the ability to accurately identify women before or early in pregnancy who could benefit most from these interventions remains limited. As nulliparous women are an under-screened population with risk profiles that differ from their multiparous counterparts, development of a prediction model tailored to nulliparous women may facilitate timely preventive intervention and improve maternal and infant outcomes. We aimed to develop and validate a model for preconception and early pregnancy prediction of gestational diabetes mellitus based on clinical risk factors for nulliparous women. A risk prediction model was built within a large California birth cohort including singleton live birth records from 2007-2012. Model accuracy was assessed both internally and externally, within a cohort of women who delivered at University of Iowa Hospitals and Clinics between 2009-2017, using discrimination and calibration. Differences in predictive accuracy of the model were assessed within specific racial/ethnic groups. The prediction model included five risk factors: race/ethnicity, age at delivery, pre-pregnancy body mass index, family history of diabetes, and pre-existing hypertension. The area under the curve (AUC) for the California internal validation cohort was 0.732 (95% confidence interval (CI) 0.728, 0.735), and 0.710 (95% CI 0.672, 0.749) for the Iowa external validation cohort. The model performed particularly well in Hispanic (AUC 0.739) and Black women (AUC 0.719). Our findings suggest that estimation of a woman's risk for GDM through model-based incorporation of risk factors accurately identifies those at high risk (i.e., predicted risk >6%) who could benefit from preventive intervention encouraging prompt incorporation of this tool into preconception and prenatal care.
Cytoreductive surgery for neuroendocrine tumor liver metastases improves survival and symptomatic control. However, the feasibility of adequate cytoreduction in patients with many liver metastases ...remains uncertain. We compared patient outcomes based on the number of lesions treated to better define the efficacy of cytoreductive surgery for numerous neuroendocrine tumor liver metastases.
Patients undergoing hepatic cytoreductive surgery for gastroenteropancreatic neuroendocrine tumors were identified in our institutional surgical neuroendocrine tumor database. Imaging studies were reviewed to determine the liver tumor burden and percent cytoreduced. Overall survival and progression-free survival were compared, using the number of lesions treated, percent tumor debulked, and additional clinicopathologic characteristics.
A total of 188 hepatic cytoreductive procedures were identified and stratified into groups according to the number of metastases treated: 1–5, 6–10, and >10. Median overall survival and progression-free survival were 89.4 and 22.5 months, respectively, and did not differ significantly between groups. Greater than 70% cytoreduction was associated with significantly better overall survival than <70% cytoreduction (134 months versus 38 months).
In patients with gastroenteropancreatic neuroendocrine tumors and liver metastases, >70% cytoreduction led to improved overall survival and progression-free survival and was achieved reliably in patients undergoing debulking of >10 lesions. These data support an aggressive approach to patients with numerous neuroendocrine tumor liver metastases to achieve >70% cytoreduction.
Discovering important genes that account for the phenotype of interest has long been a challenge in genome-wide expression analysis. Analyses such as gene set enrichment analysis (GSEA) that ...incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using a latent variable approach. We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information can substantially improve the accuracy of gene expression classifiers, and we shed light on several ways in which hypothesis-testing approaches such as GSEA differ from regression approaches with respect to the analysis of pathway data.