In remote sensing images (RSIs), accurate semantic segmentation faces more challenges because of small targets, unbalanced categories, and complex scenes. Restricted by local receptive field of ...convolution layers, the traditional semantic segmentation models cannot use global information of RSIs. According to the characteristics of RSIs, we propose an RSANet based on regional self-attention mechanism. Our model is no longer limited by the locality of convolution, but transfers the information flow in the whole image. It can mine out the relationship between pixels in the surrounding areas, which is more logical for understanding images content. Moreover, compared with the traditional self-attention mechanism, RSANet can effectively reduce the noise of feature maps and the interference of redundant features. Our model can get better semantic segmentation results than other current models on the DroneDeploy data set and the Chreos semantic segmentation data set. The experiments show that our RSANet achieves 2% higher mean intersection over union (mIoU) than the baseline model, especially in terms of fineness, edge integrity, and classification accuracy.
A local image descriptor robust to the common photometric transformations (blur, illumination, noise, and JPEG compression) and geometric transformations (rotation, scaling, translation, and ...viewpoint) is crucial to many image understanding and computer vision applications. In this paper, the representation and matching power of region descriptors are to be evaluated. A common set of elliptical interest regions is used to evaluate the performance. The elliptical regions are further normalized to be circular with a fixed size. The normalized circular regions will become affine invariant up to a rotational ambiguity. Here, a new distinctive image descriptor to represent the normalized region is proposed, which primarily comprises the Zernike moment (ZM) phase information. An accurate and robust estimation of the rotation angle between a pair of normalized regions is then described and used to measure the similarity between two matching regions. The discriminative power of the new ZM phase descriptor is compared with five major existing region descriptors (SIFT, GLOH, PCA-SIFT, complex moments, and steerable filters) based on the precision-recall criterion. The experimental results, involving more than 15 million region pairs, indicate the proposed ZM phase descriptor has, generally speaking, the best performance under the common photometric and geometric transformations. Both quantitative and qualitative analyses on the descriptor performances are given to account for the performance discrepancy. First, the key factor for its striking performance is due to the fact that the ZM phase has accurate estimation accuracy of the rotation angle between two matching regions. Second, the feature dimensionality and feature orthogonality also affect the descriptor performance. Third, the ZM phase is more robust under the nonuniform image intensity fluctuation. Finally, a time complexity analysis is provided.
We propose a robust object tracking algorithm based on local region sparse appearance model in this paper. In this algorithm, the object is divided into several sub-regions, and the sparse ...dictionaries are obtained by clustering in each sub-region. Therefore spatial structure information of the object can be captured well, and the change of object appearance can be also resisted effectively. First, the object is divided into many small patches. Then the object is divided into several sub-regions according to patch distribution again. The establishment of object dictionary base is based on combination of the dictionaries from all the sub-regions, and then space alignment between different parts of the object can be achieved. Meanwhile, noise removal and other operations in the existing sparse reconstruction error maps are performed to retain valuable information. In the updating framework, a novel flexible template set update mechanism is introduced in this paper. In this update mechanism, valuable object samples will be put into the template set. If samples are not valuable, they should not be put into the template set, even when the template set is not full. Then we use patch sparse coefficient histogram of updated templates to extract time domain information of the object in the form of weighted sum. Therefore, it can provide a reliable template basis for obtaining good candidate object. In addition, when tracking result deviates from the actual position of the object, we use a dynamic sub-region resampling method based on cosine angle to correct the position deviation timely. Therefore this method can effectively prevent the object from being completely lost. Both qualitative and quantitative evaluations on challenging video sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.
This paper presents an interactive technique for remote sensing image classification. In our proposal, users are able to interact with the classification system, indicating regions of interest (and ...those which are not). This feedback information is employed by a genetic programming approach to learning user preferences and combining image region descriptors that encode spectral and texture properties. Experiments demonstrate that the proposed method is effective for image classification tasks and outperforms the traditional MaxVer method.
Most multi-camera systems assume a well structured environment to detect and track objects across cameras. Cameras need to be fixed and calibrated, or only objects within a training data can be ...detected (e.g. pedestrians only). In this work, a master–slave system is presented to detect and track any objects in a network of uncalibrated fixed and mobile cameras. Cameras can have non-overlapping field-of-views. Objects are detected with the mobile cameras (the slaves) given only observations from the fixed cameras (the masters). No training stage and data are used. Detected objects are correctly tracked across cameras leading to a better understanding of the scene.
A cascade of grids of region descriptors is proposed to describe any object of interest. To lend insight on the addressed problem, most state-of-the-art region descriptors are evaluated given various schemes. The covariance matrix of various features, the histogram of colors, the histogram of oriented gradients, the scale invariant feature transform (SIFT), the speeded-up robust features (SURF) descriptors, and the color interest points
1 are evaluated. A sparse scan of the cameras’image plane is also presented to reduce the search space of the localization process, approaching nearly real-time performance. The proposed approach outperforms existing works such as scale invariant feature transform (SIFT), or the speeded-up robust features (SURF). The approach is robust to some changes in illumination, viewpoint, color distribution, image quality, and object deformation. Objects with partial occlusion are also detected and tracked.
Display omitted
•A new version of Local Binary Patterns (LBP) and its extension to color.•An embedding of LBP in covariance-based region descriptors.•Application to texture analysis, object retrieval ...and person re-identification.•Comparison with 12 other covariance descriptors.
This paper proposes a new version of LBP and its inclusion into covariance region descriptors for image matching and recognition. Starting from the non-rotation invariant uniform LBP (called nriLBP), the pattern is described by the cosine and sine values of the angular portion defined by the ‘1’s. The use of this four-value vector leads to a better resilience of the feature to noise and small neighborhood rotations. Several color versions of this feature are proposed. For region description, these local features are included in covariance matrices, noted ELBCM for Enhanced-LBP Covariance Matrix. Experimental evaluations confirm the relevance of the proposed models on three databases designed for texture analysis, object retrieval and person re-identification. A study is also made on the impact of the colorspace included in the covariance descriptor and used for LBP definition. The experiments show that ELBCM has better recognition performance than the 12 other descriptors tested.
A Multistage Approach for Image Registration Bowen, Francis; Jianghai Hu; Du, Eliza Yingzi
IEEE transactions on cybernetics,
2016-Sept., 2016-Sep, 2016-9-00, 20160901, Volume:
46, Issue:
9
Journal Article
Peer reviewed
Successful image registration is an important step for object recognition, target detection, remote sensing, multimodal content fusion, scene blending, and disaster assessment and management. The ...geometric and photometric variations between images adversely affect the ability for an algorithm to estimate the transformation parameters that relate the two images. Local deformations, lighting conditions, object obstructions, and perspective differences all contribute to the challenges faced by traditional registration techniques. In this paper, a novel multistage registration approach is proposed that is resilient to view point differences, image content variations, and lighting conditions. Robust registration is realized through the utilization of a novel region descriptor which couples with the spatial and texture characteristics of invariant feature points. The proposed region descriptor is exploited in a multistage approach. A multistage process allows the utilization of the graph-based descriptor in many scenarios thus allowing the algorithm to be applied to a broader set of images. Each successive stage of the registration technique is evaluated through an effective similarity metric which determines subsequent action. The registration of aerial and street view images from pre- and post-disaster provide strong evidence that the proposed method estimates more accurate global transformation parameters than traditional feature-based methods. Experimental results show the robustness and accuracy of the proposed multistage image registration methodology.
We validate the usage of augmented 2D shape-size pattern spectra, calculated on arbitrary connected
regions. The evaluation is performed on MSER regions and competitive performance with SIFT ...descriptors
achieved in a simple retrieval system, by combining the local pattern spectra with normalized central
moments. An additional advantage of the proposed descriptors is their size: being half the size of SIFT,
they can handle larger databases in a time-efficient manner. We focus in this paper on presenting the challenges
faced when transitioning from global pattern spectra to the local ones. An exhaustive study on the
parameters and the properties of the newly constructed descriptor is offered, as well as performance results
from preliminary experiments, validating the usage of the descriptor. We also consider possible improvements
to the quality and computation efficiency of the proposed local descriptors.
Image registration is an ill-posed problem. But due to the fact that it finds applications in many fields, it is an open research area. There are many ways to register images of which ...interpolation-based image registration is widely used. The accuracy of interpolation-based image registration depends largely on two factors - (1) Accurate localization of interest points (IPs) with respect to various distortions and (2) the robustness of region descriptors to geometry and illumination changes. In this context, through this paper, we make the following two contributions - (1) it is shown that region-based IPs outperform corner-based IPs when global distortions are dominant. (2) We propose new efficient region descriptors for the case when images to be registered differ by illumination. The proposed descriptors are computationally fast, and hence they help in fast matching.
An image hash should be (1) robust to allowable operations and (2) sensitive to illegal manipulations and distinct queries. Some applications also require the hash to be able to localize image ...tampering. This requires the hash to contain both robust content and alignment information to meet the above criterion. Fulfilling this is difficult because of two contradictory requirements. First, the hash should be small and second, to verify authenticity and then localize tampering, the amount of information in the hash about the original required would be large. Hence a tradeoff between these requirements needs to be found. This paper presents an image hashing method that addresses this concern, to not only detect but also localize tampering using a small signature (< 1kB). Illustrative experiments bring out the efficacy of the proposed method compared to existing methods.