Low-rank modeling has many important applications in computer vision and machine learning. While the matrix rank is often approximated by the convex nuclear norm, the use of nonconvex low-rank ...regularizers has demonstrated better empirical performance. However, the resulting optimization problem is much more challenging. Recent state-of-the-art requires an expensive full SVD in each iteration. In this paper, we show that for many commonly-used nonconvex low-rank regularizers, the singular values obtained from the proximal operator can be automatically threshold. This allows the proximal operator to be efficiently approximated by the power method. We then develop a fast proximal algorithm and its accelerated variant with inexact proximal step. It can be guaranteed that the squared distance between consecutive iterates converges at a rate of O(1/T), where T is the number of iterations. Furthermore, we show the proposed algorithm can be parallelized, and the resultant algorithm achieves nearly linear speedup w.r.t. the number of threads. Extensive experiments are performed on matrix completion and robust principal component analysis. Significant speedup over the state-of-the-art is observed.
Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of ...pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large-scale biomedical literature. We evaluate BioGPT on six biomedical natural language processing tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks, respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.
In this paper, we propose and demonstrate a gas pressure fiber probe with high sensitivity magnified by Vernier effect. The probe is composed of two cascaded Fabry-Perot interferometers based on a ...SMF-SOHST-OFC structure (SMF: single-mode fiber; SOHST: side-opened hollow silica tube; OFC: optical fiber column). The high-frequency CO 2 laser drilling method for hollow silica tube can effectively maintain the transient balance of the air pressure inside and outside the cavity without destroying the reflective ends of the optical fibers. Experimental results show that the prepared fiber probe with the SOHST length of 375.2 μm and column length of 247.3 μm has high gas pressure sensitivity of 80.3 pm/kPa by demodulating Vernier envelope, and it has relatively low temperature cross-sensitivity of -1.33 kPa/°C. This sensor is highly sensitive and of compact size, which not only can be applied in gas pressure sensing but also has the potential for application in microfluidic detection.
Immunotherapies like the adoptive transfer of gene-engineered T cells and immune checkpoint inhibitors are novel therapeutic modalities for advanced cancers. However, some patients are refractory or ...resistant to these therapies, and the mechanisms underlying tumor immune resistance have not been fully elucidated. Immunosuppressive cells such as myeloid-derived suppressive cells, tumor-associated macrophages, tumor-associated neutrophils, regulatory T cells (Tregs), and tumor-associated dendritic cells are critical factors correlated with immune resistance. In addition, cytokines and factors secreted by tumor cells or these immunosuppressive cells also mediate the tumor progression and immune escape of cancers. Thus, targeting these immunosuppressive cells and the related signals is the promising therapy to improve the efficacy of immunotherapies and reverse the immune resistance. However, even with certain success in preclinical studies or in some specific types of cancer, large perspectives are unknown for these immunosuppressive cells, and the related therapies have undesirable outcomes for clinical patients. In this review, we comprehensively summarized the phenotype, function, and potential therapeutic targets of these immunosuppressive cells in the tumor microenvironment.
Residue co-evolution has become the primary principle for estimating inter-residue distances of a protein, which are crucially important for predicting protein structure. Most existing approaches ...adopt an indirect strategy, i.e., inferring residue co-evolution based on some hand-crafted features, say, a covariance matrix, calculated from multiple sequence alignment (MSA) of target protein. This indirect strategy, however, cannot fully exploit the information carried by MSA. Here, we report an end-to-end deep neural network, CopulaNet, to estimate residue co-evolution directly from MSA. The key elements of CopulaNet include: (i) an encoder to model context-specific mutation for each residue; (ii) an aggregator to model residue co-evolution, and thereafter estimate inter-residue distances. Using CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrate that CopulaNet can predict protein structure with improved accuracy and efficiency. This study represents a step toward improved end-to-end prediction of inter-residue distances and protein tertiary structures.
Recently, in order to improve reactive fault tolerance techniques in large scale storage systems, researchers have proposed various statistical and machine learning methods based on SMART attributes. ...Most of these studies have focused on predicting failures of hard drives, i.e., labeling the status of a hard drive as "good" or not. However, in real-world storage systems, hard drives often deteriorate gradually rather than suddenly. Correspondingly, their SMART attributes change continuously towards failure. Inspired by this observation, we introduce a novel method based on Recurrent Neural Networks (RNN) to assess the health statuses of hard drives based on the gradually changing sequential SMART attributes. Compared to a simple failure prediction method, a health status assessment is more valuable in practice because it enables technicians to schedule the recovery of different hard drives according to the level of urgency. Experiments on real-world datasets for disks of different brands and scales demonstrate that our proposed method can not only achieve a reasonable accurate health status assessment, but also achieve better failure prediction performance than previous work.
A
bstract
Deep learning methods have been increasingly adopted to study jets in particle physics. Since symmetry-preserving behavior has been shown to be an important factor for improving the ...performance of deep learning in many applications, Lorentz group equivariance — a fundamental spacetime symmetry for elementary particles — has recently been incorporated into a deep learning model for jet tagging. However, the design is computationally costly due to the analytic construction of high-order tensors. In this article, we introduce LorentzNet, a new symmetry-preserving deep learning model for jet tagging. The message passing of LorentzNet relies on an efficient Minkowski dot product attention. Experiments on two representative jet tagging benchmarks show that LorentzNet achieves the best tagging performance and improves significantly over existing state-of-the-art algorithms. The preservation of Lorentz symmetry also greatly improves the efficiency and generalization power of the model, allowing LorentzNet to reach highly competitive performance when trained on only a few thousand jets.
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language ...processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as grammatical error correction, text summarization, text style transfer, dialogue, semantic parsing, automatic speech recognition, and so on. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, reasonable training objectives, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications.
Drug-drug interaction (DDI) prediction identifies interactions of drug combinations in which the adverse side effects caused by the physicochemical incompatibility have attracted much attention. ...Previous studies usually model drug information from single or dual views of the whole drug molecules but ignore the detailed interactions among atoms, which leads to incomplete and noisy information and limits the accuracy of DDI prediction. In this work, we propose a novel dual-view drug representation learning network for DDI prediction ('DSN-DDI'), which employs local and global representation learning modules iteratively and learns drug substructures from the single drug ('intra-view') and the drug pair ('inter-view') simultaneously. Comprehensive evaluations demonstrate that DSN-DDI significantly improved performance on DDI prediction for the existing drugs by achieving a relatively improved accuracy of 13.01% and an over 99% accuracy under the transductive setting. More importantly, DSN-DDI achieves a relatively improved accuracy of 7.07% to unseen drugs and shows the usefulness for real-world DDI applications. Finally, DSN-DDI exhibits good transferability on synergistic drug combination prediction and thus can serve as a generalized framework in the drug discovery field.
Stochastic variance reduced methods have gained a lot of interest recently for empirical risk minimization due to its appealing run time complexity. When the data size is large and disjointly stored ...on different machines, it becomes imperative to distribute the implementation of such variance reduced methods. In this paper, we consider a general framework that directly distributes popular stochastic variance reduced methods in the master/slave model, by assigning outer loops to the parameter server, and inner loops to worker machines. This framework is natural and friendly to implement, but its theoretical convergence is not well understood. We obtain a comprehensive understanding of algorithmic convergence with respect to data homogeneity by measuring the smoothness of the discrepancy between the local and global loss functions. We establish the linear convergence of distributed versions of a family of stochastic variance reduced algorithms, including those using accelerated and recursive gradient updates, for minimizing strongly convex losses. Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary. Furthermore, we show that when the data are less balanced, regularization can be used to ensure convergence at a slower rate. We also demonstrate that our analysis can be further extended to handle nonconvex loss functions.