In this article, we propose a new projection test for linear hypotheses on regression coefficient matrices in linear models with high-dimensional responses. We systematically study the theoretical ...properties of the proposed test. We first derive the optimal projection matrix for any given projection dimension to achieve the best power and provide an upper bound for the optimal dimension of projection matrix. We further provide insights into how to construct the optimal projection matrix. One- and two-sample mean problems can be formulated as special cases of linear hypotheses studied in this article. We both theoretically and empirically demonstrate that the proposed test can outperform the existing ones for one- and two-sample mean problems. We conduct Monte Carlo simulation to examine the finite sample performance and illustrate the proposed test by a real data example.
This article is concerned with screening features in ultrahigh-dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence ...screening procedure based on the distance correlation (DC-SIS). The DC-SIS can be implemented as easily as the sure independence screening (SIS) procedure based on the Pearson correlation proposed by Fan and Lv. However, the DC-SIS can significantly improve the SIS. Fan and Lv established the sure screening property for the SIS based on linear models, but the sure screening property is valid for the DC-SIS under more general settings, including linear models. Furthermore, the implementation of the DC-SIS does not require model specification (e.g., linear model or generalized linear model) for responses or predictors. This is a very appealing property in ultrahigh-dimensional data analysis. Moreover, the DC-SIS can be used directly to screen grouped predictor variables and multivariate response variables. We establish the sure screening property for the DC-SIS, and conduct simulations to examine its finite sample performance. A numerical comparison indicates that the DC-SIS performs much better than the SIS in various models. We also illustrate the DC-SIS through a real-data example.
Based on a deep generative model of variational graph autoencoder (VGAE), we develop a new method, DeepLinc (deep learning framework for Landscapes of Interacting Cells), for the de novo ...reconstruction of cell interaction networks from single-cell spatial transcriptomic data. DeepLinc demonstrates high efficiency in learning from imperfect and incomplete spatial transcriptome data, filtering false interactions, and imputing missing distal and proximal interactions. The latent representations learned by DeepLinc are also used for inferring the signature genes contributing to the cell interaction landscapes, and for reclustering the cells based on the spatially coded cell heterogeneity in complex tissues at single-cell resolution.
This work is concerned with marginal sure independence feature screening for ultrahigh dimensional discriminant analysis. The response variable is categorical in discriminant analysis. This enables ...us to use the conditional distribution function to construct a new index for feature screening. In this article, we propose a marginal feature screening procedure based on empirical conditional distribution function. We establish the sure screening and ranking consistency properties for the proposed procedure without assuming any moment condition on the predictors. The proposed procedure enjoys several appealing merits. First, it is model-free in that its implementation does not require specification of a regression model. Second, it is robust to heavy-tailed distributions of predictors and the presence of potential outliers. Third, it allows the categorical response having a diverging number of classes in the order of O(n
κ
) with some κ ⩾ 0. We assess the finite sample property of the proposed procedure by Monte Carlo simulation studies and numerical comparison. We further illustrate the proposed methodology by empirical analyses of two real-life datasets. Supplementary materials for this article are available online.
This work is concerned with testing the population mean vector of nonnormal high-dimensional multivariate data. Several tests for high-dimensional mean vector, based on modifying the classical ...Hotelling T ² test, have been proposed in the literature. Despite their usefulness, they tend to have unsatisfactory power performance for heavy-tailed multivariate data, which frequently arise in genomics and quantitative finance. This article proposes a novel high-dimensional nonparametric test for the population mean vector for a general class of multivariate distributions. With the aid of new tools in modern probability theory, we proved that the limiting null distribution of the proposed test is normal under mild conditions when p is substantially larger than n . We further study the local power of the proposed test and compare its relative efficiency with a modified Hotelling T ² test for high-dimensional data. An interesting finding is that the newly proposed test can have even more substantial power gain with large p than the traditional nonparametric multivariate test does with finite fixed p . We study the finite sample performance of the proposed test via Monte Carlo simulations. We further illustrate its application by an empirical analysis of a genomics dataset. Supplementary materials for this article are available online.
The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and ...variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate. Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors. In addition, it is shown that the loss in efficiency is at most 11.1% for estimating varying coefficient functions and is no greater than 13.6% for estimating parametric components. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. Finally, we apply the new methods to analyze the plasma beta-carotene level data.
Nanofluid minimum quantity lubrication (NMQL) technique has many technological and economic advantages in grinding operation. NMQL can improve grinding performance in terms of cooling and lubrication ...and is ecofriendly because it consumes a small amount of grinding fluid. Ultrasonic machining can improve grinding performance owing to its reciprocating vibration mechanism and furrow widening. Consequently, the simultaneous utilization of these techniques is anticipated to improve the surface quality, especially for hard brittle materials. In this research, multiangle two-dimensional (2D) ultrasonic vibration is utilized in zirconia ceramic grinding. Results reveal that the adhesion and material peeling phenomenon on the workpiece surface is obviously reduced compared with dry grinding without ultrasonic vibration. The synergistic effect of multiangle 2D ultrasonic and NMQL is also studied. With increased angle, the roughness value is found to initially increase (from 45° to 90°) and then decreases (from 90° to 135°). Moreover, the lubricating effect under 90° is the poorest, with the highest
R
a and
RS
m values of 0.703 μm and 0.106 mm, respectively; conversely, the minimum
R
a value (0.585 μm) is obtained under 45°, and the lowest
RS
m value (0.076 mm) is obtained under 135°.
High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of ...highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data.Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.