Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is ...sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vectors and convolutional neural network features. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset, which verifies the effectiveness of our approach.
Learning time-series representations when only unlabeled data or few labeled samples are available can be a challenging task. Recently, contrastive self-supervised learning has shown great ...improvement in extracting useful representations from unlabeled data via contrasting different augmented views of data. In this work, we propose a novel T ime- S eries representation learning framework via T emporal and C ontextual C ontrasting ( TS-TCC ) that learns representations from unlabeled data with contrastive learning. Specifically, we propose time-series-specific weak and strong augmentations and use their views to learn robust temporal relations in the proposed temporal contrasting module, besides learning discriminative representations by our proposed contextual contrasting module. Additionally, we conduct a systematic study of time-series data augmentation selection, which is a key part of contrastive learning. We also extend TS-TCC to the semi-supervised learning settings and propose a C lass- A ware TS- TCC ( CA-TCC ) that benefits from the available few labeled data to further improve representations learned by TS-TCC. Specifically, we leverage the robust pseudo labels produced by TS-TCC to realize a class-aware contrastive loss. Extensive experiments show that the linear evaluation of the features learned by our proposed framework performs comparably with the fully supervised training. Additionally, our framework shows high efficiency in few labeled data and transfer learning scenarios. The code is publicly available at https://github.com/emadeldeen24/CA-TCC .
The fast development of self-supervised learning (SSL) lowers the bar learning feature representation from massive unlabeled data and has triggered a series of researches on change detection of ...remote sensing images. Challenges in adapting SSL from natural images classification to remote sensing images change detection arise from difference between the two tasks. The learned patch-level feature representations are not satisfying for the pixel-level precise change detection. In this article, we proposed a novel pixel-level self-supervised hyperspectral spatial-spectral feature understanding network (HyperNet) to accomplish pixelwise feature representation for effective hyperspectral change detection. Concretely, not patches but the whole images are fed into the network and the multitemporal spatial-spectral features are compared pixel by pixel. Instead of processing the 2-D imaging space and spectral response dimension in hybrid style, a powerful spatial-spectral attention module is put forward to explore the spatial correlation and discriminative spectral features of multitemporal hyperspectral images (HSIs), separately. Only the positive samples at the same location of bitemporal HSIs are created and forced to be aligned, aiming at learning the spectral difference-invariant features. Moreover, a new similarity loss function named focal cosine is proposed to solve the problem of imbalanced easy and hard positive samples comparison, where the weights of those hard samples are enlarged and highlighted to promote the network training. Six hyperspectral datasets have been adopted to test the validity and generalization of the proposed HyperNet. The extensive experiments demonstrate the superiority of HyperNet over the state-of-the-art algorithms on downstream hyperspectral change detection tasks.
There are different methods for predicting streamflow, and, recently machine learning has been widely used for this purpose. This technique uses a wide set of covariables in the prediction process ...that must undergo a selection to increase the precision and stability of the models. Thus, this work aimed to analyze the effect of covariable selection with Recursive Feature Elimination (RFE) and Forward Feature Selection (FFS) in the performance of machine learning models to predict daily streamflow. The study was carried out in the Piranga river basin, located in the State of Minas Gerais, Brazil. The database consisted of an 18-year-old historical series (2000–2017) of streamflow data at the outlet of the basin and the covariables derived from the streamflow of affluent rivers, precipitation, land use and land cover, products from the MODIS sensors, and time. The highly correlated covariables were eliminated and the selection of covariables by the level of importance was carried out by the RFE and FFS methods for the Multivariate Adaptive Regression (EARTH), Multiple Linear Regression (MLR), and Random Forest (RF) models. The data were partitioned into two groups: 75% for training and 25% for validation. The models were run 50 times and had their performance evaluated by the Nash Sutcliffe efficiency coefficient (NSE), Determination coefficient (R2), and Root of Mean Square Error (RMSE). The three models tested showed satisfactory performance with both covariable selection methods, however, all of them proved to be inaccurate for predicting values associated with maximum streamflow events. The use of FFS, in most cases, improved the performance of the models and reduced the number of selected covariables. The use of machine learning to predict daily streamflow proved to be efficient and the use of FFS in the selection of covariables enhanced this efficiency.
•The Forward Feature Selection (FFS) was more advantageous for selecting covariables.•In most cases, the FFS selected fewer covariables to predict daily streamflow.•The machine learning models with covariable selection were efficient in predicting streamflow.•The most important covariable was the streamflow of affluent rivers lagged in time.
The similarity among samples and the discrepancy among clusters are two crucial aspects of image clustering. However, current deep clustering methods suffer from inaccurate estimation of either ...feature similarity or semantic discrepancy. In this paper, we present a Semantic Pseudo-labeling-based Image ClustEring (SPICE) framework, which divides the clustering network into a feature model for measuring the instance-level similarity and a clustering head for identifying the cluster-level discrepancy. We design two semantics-aware pseudo-labeling algorithms, prototype pseudo-labeling and reliable pseudo-labeling, which enable accurate and reliable self-supervision over clustering. Without using any ground-truth label, we optimize the clustering network in three stages: 1) train the feature model through contrastive learning to measure the instance similarity; 2) train the clustering head with the prototype pseudo-labeling algorithm to identify cluster semantics; and 3) jointly train the feature model and clustering head with the reliable pseudo-labeling algorithm to improve the clustering performance. Extensive experimental results demonstrate that SPICE achieves significant improvements (~10%) over existing methods and establishes the new state-of-the-art clustering results on six balanced benchmark datasets in terms of three popular metrics. Importantly, SPICE significantly reduces the gap between unsupervised and fully-supervised classification; e.g. there is only 2% (91.8% vs 93.8%) accuracy difference on CIFAR-10. Our code is made publicly available at https://github.com/niuchuangnn/SPICE .
The abundant spatial and contextual information provided by the advanced remote sensing technology has facilitated subsequent automatic interpretation of the optical remote sensing images (RSIs). In ...this paper, a novel and effective geospatial object detection framework is proposed by combining the weakly supervised learning (WSL) and high-level feature learning. First, deep Boltzmann machine is adopted to infer the spatial and structural information encoded in the low-level and middle-level features to effectively describe objects in optical RSIs. Then, a novel WSL approach is presented to object detection where the training sets require only binary labels indicating whether an image contains the target object or not. Based on the learnt high-level features, it jointly integrates saliency, intraclass compactness, and interclass separability in a Bayesian framework to initialize a set of training examples from weakly labeled images and start iterative learning of the object detector. A novel evaluation criterion is also developed to detect model drift and cease the iterative learning. Comprehensive experiments on three optical RSI data sets have demonstrated the efficacy of the proposed approach in benchmarking with several state-of-the-art supervised-learning-based object detection approaches.
Semi-supervised learning (SSL) has tremendous value in practice due to the utilization of both labeled and unlabelled data. An essential class of SSL methods, referred to as graph-based ...semi-supervised learning (GSSL) methods in the literature, is to first represent each sample as a node in an affinity graph, and then, the label information of unlabeled samples can be inferred based on the structure of the constructed graph. GSSL methods have demonstrated their advantages in various domains due to their uniqueness of structure, the universality of applications, and their scalability to large-scale data. Focusing on GSSL methods only, this work aims to provide both researchers and practitioners with a solid and systematic understanding of relevant advances as well as the underlying connections among them. The concentration on one class of SSL makes this article distinct from recent surveys that cover a more general and broader picture of SSL methods yet often neglect the fundamental understanding of GSSL methods. In particular, a significant contribution of this article lies in a newly generalized taxonomy for GSSL under the unified framework, with the most up-to-date references and valuable resources such as codes, datasets, and applications. Furthermore, we present several potential research directions as future work with our insights into this rapidly growing field.
CCGL: Contrastive Cascade Graph Learning Xu, Xovee; Zhou, Fan; Zhang, Kunpeng ...
IEEE transactions on knowledge and data engineering,
05/2023, Letnik:
35, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Supervised learning, while prevalent for information cascade modeling, often requires abundant labeled data in training, and the trained model is not easy to generalize across tasks and datasets. It ...often learns task-specific representations, which can easily result in overfitting for downstream tasks. Recently, self-supervised learning is designed to alleviate these two fundamental issues in linguistic and visual tasks. However, its direct applicability for information cascade modeling, especially graph cascade related tasks, remains underexplored. In this work, we present Contrastive Cascade Graph Learning ( CCGL ), a novel framework for information cascade graph learning in a contrastive , self-supervised , and task-agnostic way. In particular, CCGL first designs an effective data augmentation strategy to capture variation and uncertainty by simulating the information diffusion in graphs. Second, it learns a generic model for graph cascade tasks via self-supervised contrastive pre-training using both unlabeled and labeled data. Third, CCGL learns a task-specific cascade model via fine-tuning using labeled data. Finally, to make the model transferable across datasets and cascade applications, CCGL further enhances the model via distillation using a teacher-student architecture. We demonstrate that CCGL significantly outperforms its supervised and semi-supervised counterparts for several downstream tasks.
Learning effective visual representations without human supervision is a critical problem for the task of semantic segmentation of remote sensing images (RSIs), where pixel-level annotations are ...difficult to obtain. Self-supervised learning (SSL), which learns useful representations by creating artificial supervised learning problems, has recently emerged as an effective method to learn from unlabelled data. Current SSL methods are generally trained on ImageNet through image-level prediction tasks. We argue that this is suboptimal for application in semantic segmentation of RSIs since it does not take into account spatial position information between objects, which is critical for the segmentation of RSIs characterized by multiobject. In this study, we propose a novel self-supervised dense representation learning method, IndexNet, for the semantic segmentation of RSIs. On the one hand, considering the multiobject characteristics of RSIs, IndexNet learns pixel-level representations by tracking object positions, while maintaining sensitivity to object position changes to ensure that no mismatches are caused. On the other hand, by combining image-level contrast and pixel-level contrast, IndexNet can learn spatiotemporal invariant features. Experimental results show that our method works better than ImageNet pretraining and outperforms state-of-the-art (SOTA) SSL methods. Code and pretrained models will be available at https://github.com/pUmpKin-Co/offical-IndexNet .