Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the ...group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.
Full text
Available for:
BFBNIB, INZLJ, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK, ZRSKP
Abstract
Clustering is a critical component of single-cell RNA sequencing (scRNA-seq) data analysis and can help reveal cell types and infer cell lineages. Despite considerable successes, there are ...few methods tailored to investigating cluster-specific genes contributing to cell heterogeneity, which can promote biological understanding of cell heterogeneity. In this study, we propose a zero-inflated negative binomial mixture model (ZINBMM) that simultaneously achieves effective scRNA-seq data clustering and gene selection. ZINBMM conducts a systemic analysis on raw counts, accommodating both batch effects and dropout events. Simulations and the analysis of five scRNA-seq datasets demonstrate the practical applicability of ZINBMM.
We study the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are ...particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li J. Amer. Statist. Assoc. 96 (2001) 1348-1360 and Fan and Peng Ann. Statist. 32 (2004) 928-961. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.
Full text
Available for:
BFBNIB, INZLJ, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK, ZRSKP
To develop a reference of population-based gestational age-specific birth weight percentiles for contemporary Chinese.
Birth weight data was collected by the China National Population-based Birth ...Defects Surveillance System. A total of 1,105,214 live singleton births aged ≥28 weeks of gestation without birth defects during 2006-2010 were included. The lambda-mu-sigma method was utilized to generate percentiles and curves.
Gestational age-specific birth weight percentiles for male and female infants were constructed separately. Significant differences were observed between the current reference and other references developed for Chinese or non-Chinese infants.
There have been moderate increases in birth weight percentiles for Chinese infants of both sexes and most gestational ages since 1980s, suggesting the importance of utilizing an updated national reference for both clinical and research purposes.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
High-throughput technologies have been used to generate a large amount of omics data. In the past, single-level analysis has been extensively conducted where the omics measurements at different ...levels, including mRNA, microRNA, CNV and DNA methylation, are analyzed separately. As the molecular complexity of disease etiology exists at all different levels, integrative analysis offers an effective way to borrow strength across multi-level omics data and can be more powerful than single level analysis. In this article, we focus on reviewing existing multi-omics integration studies by paying special attention to variable selection methods. We first summarize published reviews on integrating multi-level omics data. Next, after a brief overview on variable selection methods, we review existing supervised, semi-supervised and unsupervised integrative analyses within parallel and hierarchical integration studies, respectively. The strength and limitations of the methods are discussed in detail. No existing integration method can dominate the rest. The computation aspects are also investigated. The review concludes with possible limitations and future directions for multi-level omics data integration.
The effects of thyroid-stimulating hormone (TSH) and thyroid hormones on the development of human papillary thyroid cancer (PTC) remain poorly understood.
The study population consisted of 741 (341 ...women, 400 men) histologically confirmed PTC cases and 741 matched controls with prediagnostic serum samples stored in the Department of Defense Serum Repository. Concentrations of TSH, total T3, total T4, and free T4 were measured in serum samples. Conditional logistic regression models were used to calculate ORs and 95% confidence intervals (CI).
The median time between blood draw and PTC diagnosis was 1,454 days. Compared with the middle tertile of TSH levels within the normal range, serum TSH levels below the normal range were associated with an elevated risk of PTC among women (OR, 3.74; 95% CI, 1.53-9.19) but not men. TSH levels above the normal range were associated with an increased risk of PTC among men (OR, 1.96; 95% CI, 1.04-3.66) but not women. The risk of PTC decreased with increasing TSH levels within the normal range among both men and women (
= 0.0005 and 0.041, respectively).
We found a significantly increased risk of PTC associated with TSH levels below the normal range among women and with TSH levels above the normal range among men. An inverse association between PTC and TSH levels within the normal range was observed among both men and women.
These results could have significant clinical implications for physicians who are managing patients with abnormal thyroid functions and those with thyroidectomy.
.
In survival analysis, when a subset of subjects has extremely long survival, the two-part cure rate model has been commonly adopted. In the two-part model, the first part is for a binary response and ...describes the probability of cure. The second part is for a survival response and describes the probability of survival. Despite their intuitive interconnections, most of the existing works estimate the two parts without any constraint. The existing works on proportionality promote similarity in magnitudes (i.e. quantitative similarity) and can be too restrictive. In this study, for the two-part cure rate model, we propose imposing a sign-based penalty to promote similarity in signs (i.e. qualitative similarity). The proposed strategy can be more informative than those that neglect the two-part interconnections and be less restrictive than the existing proportionality works. Penalty is also imposed to select relevant variables and accommodate high-dimensional data. Numerical studies, including simulation and two data analyses, demonstrate the advantageous performance of the proposed approach.
Full text
Available for:
NUK, OILJ, SAZU, UKNU, UL, UM, UPUK
We report on whole-exome sequencing (WES) of 213 melanomas. Our analysis established NF1, encoding a negative regulator of RAS, as the third most frequently mutated gene in melanoma, after BRAF and ...NRAS. Inactivating NF1 mutations were present in 46% of melanomas expressing wild-type BRAF and RAS, occurred in older patients and showed a distinct pattern of co-mutation with other RASopathy genes, particularly RASA2. Functional studies showed that NF1 suppression led to increased RAS activation in most, but not all, melanoma cases. In addition, loss of NF1 did not predict sensitivity to MEK or ERK inhibitors. The rebound pathway, as seen by the induction of phosphorylated MEK, occurred in cells both sensitive and resistant to the studied drugs. We conclude that NF1 is a key tumor suppressor lost in melanomas, and that concurrent RASopathy gene mutations may enhance its role in melanomagenesis.
Full text
Available for:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SBMB, UILJ, UKNU, UL, UM, UPUK
Testicular cancer (TC) is the most common malignancy in young adult men, and in many countries the incidence rates of testicular cancer have been increasing since the middle of the twentieth century. ...Since disease presentation and tumor progression patterns are often heterogeneous across racial groups, there may be important racial differences in recent TC trends.
In this study, Surveillance, Epidemiology, and End Results (SEER) data on TC patients diagnosed between 1973 and 2015 were analyzed, including the following racial/ethnic groups: non-Hispanic whites (NHW), Hispanic whites (HW), blacks, and Asians and Pacific Islanders (API). Patient characteristics, age-adjusted incidence rates, and survival were compared across racial groups. A multivariate Cox model was used to analyze the survival data of TC patients, in order to evaluate racial differences across several relevant factors, including marital status, age group, histologic type, treatment, stage, and tumor location.
NHWs had the highest incidence rates, followed by blacks, HWs, and APIs. There were significant survival differences among the racial groups, with NHWs having the highest survival rates and blacks having the lowest.
An analysis of SEER data showed that racial differences existed among TC patients in the United States with respect to patient characteristics, incidence, and survival. The results can be useful to stakeholders interested in reducing the burden of TC morbidity and mortality.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In an analysis of more than 119,000 patients with acute MI admitted to over 1800 hospitals, patients treated in high-performing hospitals (with low 30-day risk-standardized mortality) had longer life ...expectancies than those treated in low-performing hospitals.
Public reporting has become a mainstay of national efforts to improve the quality of care delivered in U.S. hospitals.
1
Increasingly, risk-standardized mortality rates are used to benchmark quality and gauge hospital performance because they reflect meaningful and widely interpretable results of hospital care.
2
,
3
Since 2007, the Centers for Medicare and Medicaid Services (CMS) has reported hospital-specific 30-day risk-standardized mortality rates for several common conditions, and more recently, risk-standardized mortality rates have been incorporated into payment policies.
4
–
6
Although several studies have evaluated the association of condition-specific risk-standardized mortality rates with other short-term quality metrics,
7
–
16
it is not known . . .