Popular methods usually use a degradation model in a supervised way to learn a watermark removal model. However, it is true that reference images are difficult to obtain in the real world, as well as ...collected images by cameras suffer from noise. To overcome these drawbacks, we propose a perceptive self-supervised learning network for noisy image watermark removal (PSLNet) in this paper. PSLNet depends on a parallel network to remove noise and watermarks. The upper network uses task decomposition ideas to remove noise and watermarks in sequence. The lower network utilizes the degradation model idea to simultaneously remove noise and watermarks. Specifically, mentioned paired watermark images are obtained in a self-supervised way, and paired noisy images (i.e., noisy and reference images) are obtained in a supervised way. To enhance the clarity of obtained images, interacting two sub-networks and fusing obtained clean images are used to improve the effects of image watermark removal in terms of structural information and pixel enhancement. Taking into texture information account, a mixed loss uses obtained images and features to achieve a robust model of noisy image watermark removal. Comprehensive experiments show that our proposed method is very effective in comparison with popular convolutional neural networks (CNNs) for noisy image watermark removal. Codes can be obtained at https://github.com/hellloxiaotian/PSLNet.
Recently, Point-MAE has extended Masked Autoencoders (MAE) to point clouds for 3D self-supervised learning, which however faces two problems: (1) the shape similarity between the masked point cloud ...and original point cloud is high, and (2) the pretext task of reconstructing the original point cloud is straightforward which fails to compel the network to learn deep representative features. In this paper, we tackle these problems by proposing a PatchMixing strategy and a teacher-student training framework. First, with PatchMixing, we mix selected point patches of multiple point clouds and attempt to infer the object information from the resulting mixed point cloud. Due to the interference of other objects, the task is challenging but facilitates representation learning. Second, rather than directly restoring the original point cloud, we propose a novel pretext task that involves a two-branch teacher model and a student model. These models process the multiple input point clouds in different ways (no mixing, mixing + unmixing, mixing + masking), but are expected to output similar features, thereby compelling the network to extract essential features from the input. Extensive experiments show that our well-designed PatchMixing strategy and effective teacher-student learning architecture yield impressive results. Specifically, our model achieves a remarkable 92.9% classification accuracy in the Linear SVM task on the ModelNet40 dataset. Through pre-training and fine-tuning on downstream tasks, our method achieves an 89.8% classification accuracy on the most challenging split of ScanObjectNN and an outstanding 94.0% on ModelNet40.
Few-shot font generation (FFG) aims to preserve the underlying global structure of the original character while generating target fonts by referring to a few samples. It has been applied to font ...library creation, a personalized signature, and other scenarios. Existing FFG methods explicitly disentangle content and style of reference glyphs universally or component-wisely. However, they ignore the difference between glyphs in different styles and the similarity of glyphs in the same style, which results in artifacts such as local distortions and style inconsistency. To address this issue, we propose a novel font generation approach by learning the Difference between different styles and the Similarity of the same style (DS-Font). We introduce contrastive learning to consider the positive and negative relationship between styles. Specifically, we propose a multi-layer style projector (MSP) for style encoding and realize a distinctive style representation via our proposed Cluster-level Contrastive Style (CCS) loss. The MSP module is employed to assist the generator during training to enhance the style consistency between the generated glyph and the reference glyphs. In addition, we design a glyph-independent patch discriminator, which comprehensively considers different areas of the image and ensures that each style can be distinguished independently. We conduct qualitative and quantitative evaluations comprehensively to demonstrate that our approach achieves significantly better results than state-of-the-art methods.
Graph Contrastive Learning (GCL) has achieved great success in self-supervised representation learning throughout positive and negative pairs based on graph neural networks (GNNs), where one critical ...issue lies in how to handle the false hard negatives that share the large similarity to the same referenced class as the anchor, which is critical to message passing of GNNs to exploit the graph structure. However, the existing arts either mistakenly identify or miss the false hard negatives, hence resulting into poor node representation. Building on this, there are several crucial bottlenecks - Where do false hard negatives exist upon the anchor? How to well seek false hard negatives? Whether are more false hard negatives better? To answer these questions, in this paper, we propose a novel Locally Weighted Graph Contrastive Learning method, named LocWGCL, while revealing that false hard negatives are primarily distributed in the first-order and second-order neighborhoods of the anchor. Benefiting from the tightness between the first-order nodes and the anchor, representation similarity is calculated to select false hard negatives. For the second-order case, false hard negatives are identified, such that they share the similar passed message with the anchor over the common first-order nodes, along with the large similarity. Upon the seeking process, we devise a weighted strategy to false hard negatives for better node representation. Empirical studies verify the advantages of LocWGCL over the state-of-the-arts on six benchmarks.
Although interactive image segmentation techniques have made significant progress, supervised learning-based methods rely heavily on large-scale labeled data which is difficult to obtain in certain ...domains such as medicine, biology, etc. Models trained on natural images also struggle to achieve satisfactory results when directly applied to these domains. To solve this dilemma, we propose a Self-supervised Interactive Segmentation (SIS) method that achieves superior generalization performance. By clustering features from unlabeled data, we obtain classifiers that assign pseudo-labels to pixels in images. After refinement by super-pixel voting, these pseudo-labels are then used to train our segmentation network. To enable our network to better adapt to cross-domain images, we introduce correction learning and anti-forgetting regularization to conduct test-time adaptation. Our experiment results on five datasets show that our approach significantly outperforms other interactive segmentation methods across natural image datasets in the same conditions and achieves even better performance than some supervised methods when across to medical image domain. The code and models are available at https://github.com/leal0110/SIS.
Single image-based 3D shape retrieval (IBSR) has attracted appealing academic interests recently, which aims to find the corresponding 3D shape from a shape repository for a given single 2D image. ...However, state-of-the-art methods neglect the discrepancy in the image domain due to unavoidable occlusion. The occluded image representations acting as noise, may perturb the alignment of the normal 2D representations with the 3D representations, resulting in occlusion-sensitive image-shape retrieval. To tackle this crucial challenge, in this paper, we propose a novel Occlusion-invariant PErception Network (OPEN) to learn occlusion-invariant image representations and image-shape correspondence. Specifically, we propose a hard occlusion example mining strategy to sample a hard image pair. Hereafter, to enforce the consistency between normal and occluded 2D images, we propose an Occlusion-invariant Image Consistency (OIC) based on hard image pairs, which gathers 2D image representations of the same instance while pushing away other 2D image representations. In addition, to prevent the 3D representations from perturbation by the occluded 2D representations, we design an Occlusion-invariant Correspondence Consistency (OCC) based on hard image pairs, which pulls the image-specific 3D shape embedding derived by attention mechanism close to the other 2D image representation of the same instance. The combination of OIC and OCC leads to accurate 2D-3D shape matching in challenging occluded scenarios. Our OPEN outperforms state-of-the-art methods by 6%~11% in terms of Top-1 retrieval accuracy on several representative benchmark datasets.
Popular convolutional neural networks mainly use paired images in a supervised way for image watermark removal. However, watermarked images do not have reference images in the real world, which ...results in poor robustness of image watermark removal techniques. In this paper, we propose a self-supervised convolutional neural network (CNN) in image watermark removal (SWCNN). SWCNN uses a self-supervised way to construct reference watermarked images rather than given paired training samples, according to watermark distribution. A heterogeneous U-Net architecture is used to extract more complementary structural information via simple components for image watermark removal. Taking into account texture information, a mixed loss is exploited to improve visual effects of image watermark removal. Besides, a watermark dataset is conducted. Experimental results show that the proposed SWCNN is superior to popular CNNs in image watermark removal.
To accurately estimate 3D human pose from monocular camera images, a large amount of 3D annotated data is required. However, obtaining 3D annotated data outside the laboratory is not easy. In the ...absence of such data, weakly-supervised methods that rely on multi-view cameras during training and single-view cameras during inference have been proposed. These methods either use multi-view networks or classical triangulation to train the 3D human pose estimator. This study shows that these two paradigms can collaborate to further improve performance. The available unlabeled uncalibrated multi-view inputs are used to obtain pseudo-3D labels employing classical triangulation. A pose estimator is trained with these pseudo-3D labels and with multi-view re-projection loss. This loss enforces the 3D poses estimated from different views to be consistent and improves the performance. Therefore, our method relaxes the constraints (calibrated cameras, 2D/3D annotations), only requires multi-view videos for training, and is therefore convenient for in-the-wild settings. The proposed method outperforms previous works on two challenging datasets, Human3.6 M and MPI-INF-3DHP. Codes and pretrained models will be publicly available.
Intermittent faults (IFs) are common in electronic systems, which are short-term, repeatable and cumulative. IF samples are difficult to collect, so detection is usually performed using one-class ...learning approaches, which require only fault-free samples to participate in the training. Teacher–student model typically uses the cognitive biases of teacher and student on fault signals to detect faults. Introducing prior knowledge of IFs in the teacher model may help to produce greater fault cognitive bias and thus improve detection. Inspired by this, this paper proposes a prior knowledge-guided teacher–student (PKGTS) model based on self-supervised learning. In analog circuits, IFs cause transient changes in the circuit signal in terms of amplitude, frequency, and waveform. Therefore, based on this prior knowledge, corresponding signal transformations are designed to simulate possible fault variations and introduce prior knowledge to the teacher through a pretext task. Finally, only the knowledge of the teacher’s fault-free state is imparted to the student. During the testing phase, IF detection is achieved through the cognitive biases of faults, as the student model does not have prior knowledge of faults. In two typical analog filtering circuit experiments, the effectiveness of the proposed method under different noise levels and fault intensities is verified.
•A new self-supervised framework is proposed for IF detection in analog circuits.•The knowledge-guided method makes an effort for the interpretability of networks.•The prior knowledge is introduced with flexibility.
Given that vibration fault signals collected from industrial circumstances are usually insufficient and have no labels, supervised learning networks cannot be directly applied to recognize fault ...types in this case. Hence, automatic feature extraction of unlabeled data is urgently needed. In this study, an automatic fault feature extractor (AFFE) based on the contrastive learning algorithm-Bootstrap Your Own Latent (BYOL) network, which can extract fault features automatically without needing labeled information-is proposed. A data augmentation method for vibration signals is studied because it is critical to the contrastive learning algorithm. This study determines a data augmentation combination that can help AFFE achieve excellent performance in extracting features from unlabeled bearing fault data. To verify the validity of the proposed method, we utilize some labeled data (5% of samples with labels in a dataset) to fit linear classifiers, which are combined with the proposed AFFE to extract a feature. The aim is to predict the accuracy score of feature classification for the remaining data. The case study demonstrates that the fault features extracted using AFFE can achieve a high accuracy score of 94.81%. Therefore, the proposed AFFE-BYOL is a promising diagnostic fault feature extraction scheme to process unlabeled vibrational data.