In this paper, we prove the existence of nontrivial solutions and ground state solutions for the following planar Schrödinger–Poisson system with zero mass − Δ u + ϕ u = ( I α ∗ F ( u ) ) f ( u ) , x ...∈ R 2 , Δ ϕ = u 2 , x ∈ R 2 , where α ∈ ( 0 , 2 ), I α : R 2 → R is the Riesz potential, f ∈ C ( R , R ) is of subcritical exponential growth in the sense of Trudinger–Moser. In particular, some new ideas and analytic technique are used to overcome the double difficulties caused by the zero mass case and logarithmic convolution potential.
Generating non-existing frames from a consecutive video sequence has been an interesting and challenging problem in the video processing field. Typical kernel-based interpolation methods predict ...pixels with a single convolution process that convolves source frames with spatially adaptive local kernels, which circumvents the time-consuming, explicit motion estimation in the form of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods are prone to yield less plausible results. In addition, they cannot directly generate a frame at an arbitrary temporal position because the learned kernels are tied to the midpoint in time between the input frames. In this paper, we try to solve these problems and propose a novel non-flow kernel-based approach that we refer to as enhanced deformable separable convolution (EDSC) to estimate not only adaptive kernels, but also offsets, masks and biases to make the network obtain information from non-local neighborhood. During the learning process, different intermediate time step can be involved as a control variable by means of an extension of coord-conv trick, allowing the estimated components to vary with different input temporal information. This makes our method capable to produce multiple in-between frames. Furthermore, we investigate the relationships between our method and other typical kernel- and flow-based methods. Experimental results show that our method performs favorably against the state-of-the-art methods across a broad range of datasets. Code will be publicly available on URL: https://github.com/Xianhang/EDSC-pytorch .
As an active microwave imaging sensor for the high-resolution earth observation, synthetic aperture radar (SAR) has been extensively applied in military, agriculture, geology, ecology, oceanography, ...etc., due to its prominent advantages of all-weather and all-time working capacity. Especially, in the marine field, SAR can provide numerous high-quality services for fishery management, traffic control, sea-ice monitoring, marine environmental protection, etc. Among them, ship detection in SAR images has attracted more and more attention on account of the urgent requirements of maritime rescue and military strategy formulation. Nowadays, most researches are focusing on improving the ship detection accuracy, while the detection speed is frequently neglected, regardless of traditional feature extraction methods or modern deep learning (DL) methods. However, the high-speed SAR ship detection is of great practical value, because it can provide real-time maritime disaster rescue and emergency military planning. Therefore, in order to address this problem, we proposed a novel high-speed SAR ship detection approach by mainly using depthwise separable convolution neural network (DS-CNN). In this approach, we integrated multi-scale detection mechanism, concatenation mechanism and anchor box mechanism to establish a brand-new light-weight network architecture for the high-speed SAR ship detection. We used DS-CNN, which consists of a depthwise convolution (D-Conv2D) and a pointwise convolution (P-Conv2D), to substitute for the conventional convolution neural network (C-CNN). In this way, the number of network parameters gets obviously decreased, and the ship detection speed gets dramatically improved. We experimented on an open SAR ship detection dataset (SSDD) to validate the correctness and feasibility of the proposed method. To verify the strong migration capacity of our method, we also carried out actual ship detection on a wide-region large-size Sentinel-1 SAR image. Ultimately, under the same hardware platform with NVIDIA RTX2080Ti GPU, the experimental results indicated that the ship detection speed of our proposed method is faster than other methods, meanwhile the detection accuracy is only lightly sacrificed compared with the state-of-art object detectors. Our method has great application value in real-time maritime disaster rescue and emergency military planning.
Segmentation of roads in remote sensing images is a challenging task due to the inhomogeneous intensity, non-consistent contrast, and very cluttered background in remote sensing images. Recent ...approaches, mostly relying on convolutions or self-attention, make it difficult to extract weak and continuous road objects. Fourier neural operators provide another novel mechanism for capturing long-range and fine-grained features beyond self-attention. Based on it, we propose an adaptive Fourier convolution network (AFCNet) on the spatial-spectral domain for road segmentation in this paper. The AFCNet is built on the pipeline of the classical U-Net model and its core is the proposed Fourier neural encoder (FNE), which is built on a feed-forward layer and a flexible Fourier convolutional structure composed of Fourier-domain pooling layers, asymmetric convolutions, squeeze-excitation inspired self-attention and adaptive multiscale fusion layers. Furthermore, we combine the FNE and bottleneck in ResNet to form a hybrid global-local feature representation scheme to capture the long and weak road objects in remote sensing images. The experiments on two public datasets, the Massachusetts Roads and DeepGlobe Road Datasets, have shown that AFCNet worked with fewer parameters and outperformed most previous methods in terms of accuracy, precision, recall, and mean intersection over union (mIoU), etc.
The hyperspectral images are composed of a variety of textures across the different bands which increase the spectral similarity and make it difficult to predict the pixel-wise labels without ...inducing additional complexity at the feature level. To extract robust and discriminative features from the different regions of land cover, the hyperspectral research community is still seeking such type of convolutions which can efficiently deal with fine-grained texture information during the feature extraction phase, which often overlook this aspect by vanilla convolution. To overcome the above shortcoming, this article proposes a generalized gradient centralized 3D convolution ( G2C-Conv3D ) operation, which is a weighted combination between the vanilla and gradient centralized 3D convolutions ( GC-Conv3D ) to extract both the intensity-level semantic information and gradient-level information. This can be easily plugged into the existing HSI feature extraction networks to boost the performance of accurate prediction for land-cover types. To validate the feasibility of the proposed G2C-Conv3D , we have considered the existing CNN3D, MS3DNet, ContextNet, and SSRN feature extraction models and as well as CAE3D, VAE3D, and SAE3D autoencoder (AE) networks, respectively. All these networks are embedded with G2C-Conv3D convolution to implement both generalized gradient centralized feature extraction networks (G2C-FE) and generalized gradient centralized AE networks (G2C-AE) for fine-grained spectral-spatial feature learning. In addition, G2C-Conv2D is also considered with few networks. The extensive experiments are conducted on four most widely used hyperspectral datasets i.e., IP, KSC, UH, and UP, respectively, and compared with the nine methods. The results demonstrate that the proposed G2C-Conv3D can effectively enhance the feature learning ability of the existing networks and both the qualitative and quantitative results show the superiority and effectiveness of the proposed G2C-Conv3D . The source codes will be publicly available at https://github.com/danfenghong/G2C-Conv3D-HSI .
Convolution neural networks (CNNs) represent one of the workhorses of artificial intelligence applications. As a typical artificial intelligence application, a high‐resolution range profile (HRRP) ...target recognition method based on CNNs has aroused a lot of research interest. Most CNNs use a relatively small and single‐scale convolution kernel size to control the number of parameters and computational complexity, but recent studies indicate that CNNs with a small kernel size cannot extract enough spatial information, which hurts the recognition performance. Aiming at this problem, this paper proposes a multi‐scale group‐fusion one‐dimensional convolution neural network (MSGF‐1D‐CNN) for HRRP target recognition. MSGF‐1D‐CNN utilises multi‐scale group one‐dimensional convolution (MSG 1D‐Conv) and point‐wise convolution (PW‐Conv) to replace the standard convolution. Multi‐scale group one‐dimensional convolution can significantly reduce complexity and capture the information of targets within HRRP in different levels of detail to enhance feature extraction, while PW‐Conv can realise the fusion of multi‐scale features to help boosting recognition performance. Experiments on five mid‐course ballistic targets in the HRRP dataset show that MSGF‐1D‐CNN has superior recognition performance, and the parameter number of the model is reduced by more than 2.4 times than standard 1D‐CNN. Furthermore, MSGF‐1D‐CNN shows better performance on fine‐grained HRRP target recognition and anti‐noise robustness in most cases.
We propose a new end-to-end neural acoustic model for automatic speech recognition. The model is composed of multiple blocks with residual connections between them. Each block consists of one or more ...modules with 1D time-channel separable convolutional layers, batch normalization, and ReLU layers. It is trained with CTC loss. The proposed network achieves near state-of-the-art accuracy on LibriSpeech and Wall Street Journal, while having fewer parameters than all competing models. We also demonstrate that this model can be effectively fine-tuned on new datasets.
Motor imagery (MI) electroencephalography (EEG) decoding plays an important role in brain-computer interface (BCI), which enables motor-disabled patients to communicate with the outside world via ...external devices. Recent deep learning methods, which fail to fully explore both deep-temporal characterizations in EEGs itself and multi-spectral information in different rhythms, generally ignore the temporal or spectral dependencies in MI-EEG. Also, the lack of effective feature fusion probably leads to redundant or irrelative information and thus fails to achieve the most discriminative features, resulting in the limited MI-EEG decoding performance. To address these issues, in this paper, a MI-EEG decoding framework is proposed, which uses a novel temporal-spectral-based squeeze-and-excitation feature fusion network (TS-SEFFNet). First, the deep-temporal convolution block (DT-Conv block) implements convolutions in a cascade architecture, which extracts high-dimension temporal representations from raw EEG signals. Second, the multi-spectral convolution block (MS-Conv block) is then conducted in parallel using multi-level wavelet convolutions to capture discriminative spectral features from corresponding clinical subbands. Finally, the proposed squeeze-and-excitation feature fusion block (SE-Feature-Fusion block) maps the deep-temporal and multi-spectral features into comprehensive fused feature maps, which highlights channel-wise feature responses by constructing interdependencies among different domain features. Competitive experimental results on two public datasets demonstrate that our method is able to achieve promising decoding performance compared with the state-of-the-art methods.
To enable efficient deployment of convolutional neural networks (CNNs) on embedded platforms for different computer vision applications, several convolution variants have been introduced, such as ...depthwise convolution (DWCV), transposed convolution (TPCV), and dilated convolution (DLCV). To address the utilization degradation issue occurred in a general convolution engine for these emerging operators, a highly flexible and reconfigurable hardware accelerator is proposed to efficiently support various CNN-based vision tasks. Firstly, to avoid workload imbalance of TPCV, a zero transfer and skipping (ZTS) method is proposed to reorganize the computation process. To eliminate the redundant zero calculations of TPCV and DLCV, a sparsity-alike processing (SAP) method is proposed based on weight-oriented dataflow. Secondly, the DWCV or pooling layers are configured to be directly executed after standard convolutions without external memory accesses. Furthermore, a programmable execution schedule is introduced to gain better flexibility. Finally, the proposed accelerator is evaluated on Intel Arria 10 SoC FPGA. Experimental results show state-of-the-art performance on both large-scale and lightweight CNNs for image segmentation or classification. Specifically, the accelerator can achieve a processing speed up to 339.9 FPS and computational efficiency up to 0.58 GOPS/DSP, which is <inline-formula> <tex-math notation="LaTeX">3.3\times </tex-math></inline-formula> better than the prior art evaluated on the same network.
Road information from high-resolution remote-sensing images is widely used in various fields, and deep-learning-based methods have effectively shown high road-extraction performance. However, for the ...detection of roads sealed with tarmac, or covered by trees in high-resolution remote-sensing images, some challenges still limit the accuracy of extraction: 1) large intraclass differences between roads and unclear interclass differences between urban objects, especially roads and buildings; 2) roads occluded by trees, shadows, and buildings are difficult to extract; and 3) lack of high-precision remote-sensing datasets for roads. To increase the accuracy of road extraction from high-resolution remote-sensing images, we propose a split depth-wise (DW) separable graph convolutional network (SGCN). First, we split DW-separable convolution to obtain channel and spatial features, to enhance the expression ability of road features. Thereafter, we present a graph convolutional network to capture global contextual road information in channel and spatial features. The Sobel gradient operator is used to construct an adjacency matrix of the feature graph. A total of 13 deep-learning networks were used on the Massachusetts roads dataset and nine on our self-constructed mountain road dataset, for comparison with our proposed SGCN. Our model achieved a mean intersection over union (mIOU) of 81.65% with an F1-score of 78.99% for the Massachusetts roads dataset, and an mIOU of 62.45% with an F1-score of 45.06% for our proposed dataset. The visualization results showed that SGCN performs better in extracting covered and tiny roads and is able to effectively extract roads from high-resolution remote-sensing images.