The process of aligning a pair of shapes is a fundamental operation in computer graphics. Traditional approaches rely heavily on matching corresponding points or features to guide the alignment, a ...paradigm that falters when significant shape portions are missing. These techniques generally do not incorporate prior knowledge about expected shape characteristics, which can help compensate for any misleading cues left by inaccuracies exhibited in the input shapes. We present an approach based on a deep neural network, leveraging shape datasets to learn a
shape-aware
prior for source-to-target alignment that is robust to shape incompleteness. In the absence of ground truth alignments for supervision, we train a network on the task of shape alignment using incomplete shapes generated from full shapes for self-supervision. Our network, called
ALIGNet
, is trained to warp complete source shapes to incomplete targets, as if the target shapes were complete, thus essentially rendering the alignment
partial-shape agnostic
. We aim for the network to develop specialized expertise over the common characteristics of the shapes in each dataset, thereby achieving a higher-level understanding of the expected shape space to which a local approach would be oblivious. We constrain
ALIGNet
through an anisotropic total variation identity regularization to promote piecewise smooth deformation fields, facilitating both partial-shape agnosticism and post-deformation applications. We demonstrate that
ALIGNet
learns to align geometrically distinct shapes and is able to infer plausible mappings even when the target shape is significantly incomplete. We show that our network learns the common expected characteristics of shape collections without over-fitting or memorization, enabling it to produce plausible deformations on unseen data during test time.
Hyperspectral image (HSI) classification is an active research topic in remote sensing. Supervised learning-based methods have been widely used in HSI classification tasks due to their powerful ...feature extraction capabilities for cases of sufficiently labeled samples. However, practical applications often have limited samples with accurate labels due to the high cost of labeling or unreliable visual interpretation. We introduce a contrastive self-supervised learning (SSL) algorithm to achieve HSI classification for problems with few labeled samples. First, a new HSI-specific augmentation module is developed to generate sample pairs. Then, a contrastive SSL model based on Siamese networks is used to extract features from these easily accessible sample pairs. Finally, the labeled samples are taken to fine-tune the parameters of the classification model to boost classification performance. Tests of the contrastive self-supervised algorithm have been performed on two widely used HSI datasets. The experimental results reveal that the proposed algorithm requires a few labeled samples to achieve superior performance.
Purpose
To develop a strategy for training a physics‐guided MRI reconstruction neural network without a database of fully sampled data sets.
Methods
Self‐supervised learning via data undersampling ...(SSDU) for physics‐guided deep learning reconstruction partitions available measurements into two disjoint sets, one of which is used in the data consistency (DC) units in the unrolled network and the other is used to define the loss for training. The proposed training without fully sampled data is compared with fully supervised training with ground‐truth data, as well as conventional compressed‐sensing and parallel imaging methods using the publicly available fastMRI knee database. The same physics‐guided neural network is used for both proposed SSDU and supervised training. The SSDU training is also applied to prospectively two‐fold accelerated high‐resolution brain data sets at different acceleration rates, and compared with parallel imaging.
Results
Results on five different knee sequences at an acceleration rate of 4 shows that the proposed self‐supervised approach performs closely with supervised learning, while significantly outperforming conventional compressed‐sensing and parallel imaging, as characterized by quantitative metrics and a clinical reader study. The results on prospectively subsampled brain data sets, in which supervised learning cannot be used due to lack of ground‐truth reference, show that the proposed self‐supervised approach successfully performs reconstruction at high acceleration rates (4, 6, and 8). Image readings indicate improved visual reconstruction quality with the proposed approach compared with parallel imaging at acquisition acceleration.
Conclusion
The proposed SSDU approach allows training of physics‐guided deep learning MRI reconstruction without fully sampled data, while achieving comparable results with supervised deep learning MRI trained on fully sampled data.
Contrastive learning has achieved remarkable success in computer vision, however it is built on instance-level discrimination which leaves the valuable intra-class correlation in dataset unexploited. ...Current semantic clustering methods are proven to be helpful but they would suffer from the error accumulated in the iteration process without ground-truth guidance. In an attempt to remedy the clustering error accumulation when utilizing intra-class correlation for contrastive learning, we propose an online Contrastive Visual Clustering (CVC) method with two actions: gathering instances with highly similar feature embeddings, and penalizing instances being clustered with low confidence. CVC can integrate with not only contrastive learning but also arbitrary self-supervised learning frameworks simply as a plugin. Under various experiment settings, we show that CVC improves the linear classification performance by a large margin for models pre-trained with self-supervised representation learning, in both image and video scenarios. The code is available at https://github.com/yliu1229/CVC.
•A contrastive clustering method CVC is proposed to improve contrastive learning.•CVC is shown to be a generic method.•Experiments show CVC improves linear classification performances by a large margin.
Recently, supervised deep learning has achieved a great success in remote sensing image (RSI) semantic segmentation. However, supervised learning for semantic segmentation requires a large number of ...labeled samples, which is difficult to obtain in the field of remote sensing. A new learning paradigm, self-supervised learning (SSL), can be used to solve such problems by pretraining a general model with a large number of unlabeled images and then fine-tuning it on a downstream task with very few labeled samples. Contrastive learning is a typical method of SSL that can learn general invariant features. However, most existing contrastive learning methods are designed for classification tasks to obtain an image-level representation, which may be suboptimal for semantic segmentation tasks requiring pixel-level discrimination. Therefore, we propose a global style and local matching contrastive learning network (GLCNet) for RSI semantic segmentation. Specifically, first, the global style contrastive learning module is used to better learn an image-level representation, as we consider that style features can better represent the overall image features. Next, the local features matching the contrastive learning module is designed to learn the representations of local regions, which is beneficial for semantic segmentation. We evaluate four RSI semantic segmentation datasets, and the experimental results show that our method mostly outperforms the state-of-the-art self-supervised methods and the ImageNet pretraining method. Specifically, with 1% annotation from the original dataset, our approach improves Kappa by 6% on the International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam dataset relative to the existing baseline. Moreover, our method outperforms supervised learning methods when there are some differences between the datasets of upstream tasks and downstream tasks. Our study promotes the development of SSL in the field of RSI semantic segmentation. Since SSL could directly learn the essential characteristics of data from unlabeled data, which is easy to obtain in the remote sensing field, this may be of great significance for tasks such as global mapping. The source code is available at https://github.com/GeoX-Lab/G-RSIM .
Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input ...sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we propose the Hidden-Unit BERT (HuBERT) approach for self-supervised speech representation learning, which utilizes an offline clustering step to provide aligned target labels for a BERT-like prediction loss. A key ingredient of our approach is applying the prediction loss over the masked regions only, which forces the model to learn a combined acoustic and language model over the continuous inputs. HuBERT relies primarily on the consistency of the unsupervised clustering step rather than the intrinsic quality of the assigned cluster labels. Starting with a simple k-means teacher of 100 clusters, and using two iterations of clustering, the HuBERT model either matches or improves upon the state-of-the-art wav2vec 2.0 performance on the Librispeech (960 h) and Libri-light (60,000 h) benchmarks with 10 min, 1 h, 10 h, 100 h, and 960 h fine-tuning subsets. Using a 1B parameter model, HuBERT shows up to 19% and 13% relative WER reduction on the more challenging dev-other and test-other evaluation subsets.<xref ref-type="fn" rid="fn1"> 1 <xref ref-type="fn" rid="fn2"> 2
•We propose a pretext task, namely Rubik's cube+, consisting of three sub-tasks, i.e., cube ordering, cube orientation and masking identification.•Experiments on the two target tasks, i.e., cerebral ...hemorrhage classification and brain tumor segmentation, are conducted to demonstrate the effectiveness of our Rubik’s cube+.•Comprehensive discussions on the limitation and potential applications of our study are included.
Display omitted
Due to the development of deep learning, an increasing number of research works have been proposed to establish automated analysis systems for 3D volumetric medical data to improve the quality of patient care. However, it is challenging to obtain a large number of annotated 3D medical data needed to train a neural network well, as such manual annotation by physicians is time consuming and laborious. Self-supervised learning is one of the potential solutions to mitigate the strong requirement of data annotation by deeply exploiting raw data information. In this paper, we propose a novel self-supervised learning framework for volumetric medical data. Specifically, we propose a pretext task, i.e., Rubik’s cube+, to pre-train 3D neural networks. The pretext task involves three operations, namely cube ordering, cube rotating and cube masking, forcing networks to learn translation and rotation invariant features from the original 3D medical data, and tolerate the noise of the data at the same time. Compared to the strategy of training from scratch, fine-tuning from the Rubik’s cube+ pre-trained weights can remarkablely boost the accuracy of 3D neural networks on various tasks, such as cerebral hemorrhage classification and brain tumor segmentation, without the use of extra data.