In this paper, we present the Lipschitz regularization theory and algorithms for a novel Loss-Sensitive Generative Adversarial Network (LS-GAN). Specifically, it trains a loss function to distinguish ...between real and fake samples by designated margins, while learning a generator alternately to produce realistic samples by minimizing their losses. The LS-GAN further regularizes its loss function with a Lipschitz regularity condition on the density of real data, yielding a regularized model that can better generalize to produce new data from a reasonable number of training examples than the classic GAN. We will further present a Generalized LS-GAN (GLS-GAN) and show it contains a large family of regularized GAN models, including both LS-GAN and Wasserstein GAN, as its special cases. Compared with the other GAN models, we will conduct experiments to show both LS-GAN and GLS-GAN exhibit competitive ability in generating new images in terms of the Minimum Reconstruction Error (MRE) assessed on a separate test set. We further extend the LS-GAN to a conditional form for supervised and semi-supervised learning problems, and demonstrate its outstanding performance on image classification tasks.
Representation learning with small labeled data have emerged in many problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is ...expensive to collect. To address it, many efforts have been made on training sophisticated models with few labeled data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the principles of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, all of which underpin the foundation of recent progresses. Many implementations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. We will discuss emerging topics by revealing the intrinsic connections between unsupervised and semi-supervised learning, and propose in future directions to bridge the algorithmic and theoretical gap between transformation equivariance for unsupervised learning and supervised invariance for supervised learning, and unify unsupervised pretraining and supervised finetuning. We will also provide a broader outlook of future directions to unify transformation and instance equivariances for representation learning, connect unsupervised and semi-supervised augmentations, and explore the role of the self-supervised regularization for many learning problems.
Peptide‐based materials are one of the most important biomaterials, with diverse structures and functionalities. Over the past few decades, a self‐assembly strategy is introduced to construct ...peptide‐based nanomaterials, which can form well‐controlled superstructures with high stability and multivalent effect. More recently, peptide‐based functional biomaterials are widely utilized in clinical applications. However, there is no comprehensive review article that summarizes this growing area, from fundamental research to clinic translation. In this review, the recent progress of peptide‐based materials, from molecular building block peptides and self‐assembly driving forces, to biomedical and clinical applications is systematically summarized. Ex situ and in situ constructed nanomaterials based on functional peptides are presented. The advantages of intelligent in situ construction of peptide‐based nanomaterials in vivo are emphasized, including construction strategy, nanostructure modulation, and biomedical effects. This review highlights the importance of self‐assembled peptide nanostructures for nanomedicine and can facilitate further knowledge and understanding of these nanosystems toward clinical translation.
The recent progress in peptide‐based nanomaterials from building block peptides and self‐assembly driving forces to application‐directed ex situ and in situ construction of nanomaterials is systematically summarized. The advantages of intelligent in situ construction of peptide‐based nanomaterials in vivo are emphasized. The importance of self‐assembled peptide nanostructures for nanomedicine is highlighted.
Representation learning has significantly been developed with the advance of contrastive learning methods. Most of those methods are benefited from various data augmentations that are carefully ...designated to maintain their identities so that the images transformed from the same instance can still be retrieved. However, those carefully designed transformations limited us to further explore the novel patterns exposed by other transformations. Meanwhile, as shown in our experiments, direct contrastive learning for stronger augmented images can not learn representations effectively. Thus, we propose a general framework called Contrastive Learning with Stronger Augmentations (CLSA) to complement current contrastive learning approaches. Here, the distribution divergence between the weakly and strongly augmented images over the representation bank is adopted to supervise the retrieval of strongly augmented queries from a pool of instances. Experiments on the ImageNet dataset and downstream datasets showed the information from the strongly augmented images can significantly boost the performance. For example, CLSA achieves top-1 accuracy of 76.2% on ImageNet with a standard ResNet-50 architecture with a single-layer classifier fine-tuned, which is almost the same level as 76.5% of supervised results.
Deep neural networks have been successfully applied to many real-world applications. However, such successes rely heavily on large amounts of labeled data that is expensive to obtain. Recently, many ...methods for semi-supervised learning have been proposed and achieved excellent performance. In this study, we propose a new EnAET framework to further improve existing semi-supervised methods with self-supervised information. To our best knowledge, all current semi-supervised methods improve performance with prediction consistency and confidence ideas. We are the first to explore the role of self-supervised representations in semi-supervised learning under a rich family of transformations. Consequently, our framework can integrate the self-supervised information as a regularization term to further improve all current semi-supervised methods. In the experiments, we use MixMatch, which is the current state-of-the-art method on semi-supervised learning, as a baseline to test the proposed EnAET framework. Across different datasets, we adopt the same hyper-parameters, which greatly improves the generalization ability of the EnAET framework. Experiment results on different datasets demonstrate that the proposed EnAET framework greatly improves the performance of current semi-supervised algorithms. Moreover, this framework can also improve supervised learning by a large margin, including the extremely challenging scenarios with only 10 images per class. The code and experiment records are available in https://github.com/maple-research-lab/EnAET .
Human motion prediction aims to generate future motions based on the observed human motions. Witnessing the success of Recurrent Neural Networks (RNN) in modeling sequential data, recent works ...utilize RNNs to model human-skeleton motions on the observed motion sequence and predict future human motions. However, these methods disregard the existence of the spatial coherence among joints and the temporal evolution among skeletons, which reflects the crucial characteristics of human motions in spatiotemporal space. To this end, we propose a novel Skeleton-Joint Co-Attention Recurrent Neural Networks (SC-RNN) to capture the spatial coherence among joints, and the temporal evolution among skeletons simultaneously on a skeleton-joint co-attention feature map in spatiotemporal space. First, a skeleton-joint feature map is constructed as the representation of the observed motion sequence. Second, we design a new Skeleton-Joint Co-Attention (SCA) mechanism to dynamically learn a skeleton-joint co-attention feature map of this skeleton-joint feature map, which can refine the useful observed motion information to predict one future motion. Third, a variant of GRU embedded with SCA collaboratively models the human-skeleton motion and human-joint motion in spatiotemporal space by regarding the skeleton-joint co-attention feature map as the motion context. Experimental results of human motion prediction demonstrate that the proposed method outperforms the competing methods.
Recently, deep convolution neural networks (CNNs) steered face super-resolution methods have achieved great progress in restoring degraded facial details by joint training with facial priors. ...However, these methods have some obvious limitations. On the one hand, multi-task joint learning requires additional marking on the dataset, and the introduced prior network will significantly increase the computational cost of the model. On the other hand, the limited receptive field of CNN will reduce the fidelity and naturalness of the reconstructed facial images, resulting in suboptimal reconstructed images. In this work, we propose an efficient CNN-Transformer Cooperation Network (CTCNet) for face super-resolution tasks, which uses the multi-scale connected encoder-decoder architecture as the backbone. Specifically, we first devise a novel Local-Global Feature Cooperation Module (LGCM), which is composed of a Facial Structure Attention Unit (FSAU) and a Transformer block, to promote the consistency of local facial detail and global facial structure restoration simultaneously. Then, we design an efficient Feature Refinement Module (FRM) to enhance the encoded features. Finally, to further improve the restoration of fine facial details, we present a Multi-scale Feature Fusion Unit (MFFU) to adaptively fuse the features from different stages in the encoder procedure. Extensive evaluations on various datasets have assessed that the proposed CTCNet can outperform other state-of-the-art methods significantly. Source code will be available at https://github.com/IVIPLab/CTCNet.
An assembly‐induced retention effect for enhanced tumor photoacoustic (PA) imaging and therapeutics is described. A responsive small‐molecule precursor is prepared that simultaneously self‐assembles ...into nanofibers in tumor sites that exhibit an assembly‐induced retention effect, which results in an improved PA imaging signal and enhanced therapeutic efficacy. This successful proof‐of‐concept study paves the way to develop novel supramolecular biomaterials for cancer diagnostics and therapeutics.
In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On ...one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms.
In this work, we aim to address the problem of human interaction recognition in videos by exploring the long-term inter-related dynamics among multiple persons. Recently, Long Short-Term Memory ...(LSTM) has become a popular choice to model individual dynamic for single-person action recognition due to its ability to capture the temporal motion information in a range. However, most existing LSTM-based methods focus only on capturing the dynamics of human interaction by simply combining all dynamics of individuals or modeling them as a whole. Such methods neglect the inter-related dynamics of how human interactions change over time. To this end, we propose a novel Hierarchical Long Short-Term Concurrent Memory (H-LSTCM) to model the long-term inter-related dynamics among a group of persons for recognizing human interactions. Specifically, we first feed each person's static features into a Single-Person LSTM to model the single-person dynamic. Subsequently, at one time step, the outputs of all Single-Person LSTM units are fed into a novel Concurrent LSTM (Co-LSTM) unit, which mainly consists of multiple sub-memory units, a new cell gate, and a new co-memory cell. In the Co-LSTM unit, each sub-memory unit stores individual motion information, while this Co-LSTM unit selectively integrates and stores inter-related motion information between multiple interacting persons from multiple sub-memory units via the cell gate and co-memory cell, respectively. Extensive experiments on several public datasets validate the effectiveness of the proposed H-LSTCM by comparing against baseline and state-of-the-art methods.