Sketch-to-image synthesis aims to generate realistic images that match the input sketches or edge maps exactly. Most known sketch-to-image synthesis methods use various generative adversarial ...networks (GANs) that are trained with numerous pairs of sketches and real images. Because of the convolution locality, the low-level layers of the generators in these GANs lack global perception ability, causing feature maps derived from them easily to overlook global cues. Since the global receptive field is crucial for acquiring the non-local structures and features of sketches, the absence of global contexts will impact the generation of high-quality images. Some recent models turn to self-attention to construct global dependencies. However, they are not viable for large feature maps for the quadratic computational complexity concerning the size of feature maps. To address these problems, in this work, we propose Sketch2Photo — a new image synthesis approach that can capture global contexts as well as local features to generate photo-realistic images from weak or partial sketches or edge maps. We employ fast Fourier convolution (FFC) residual blocks to create global receptive fields in the bottom layers of the network and incorporate Swin Transformer block (STB) units to obtain long-range global contexts for large-size feature maps efficiently. We also present an improved spatial attention pooling (ISAP) module to relax the strict alignment requirements between incomplete sketches and generated images. Quantitative and qualitative experiments on multiple public datasets demonstrate the superiority of the proposed approach over many other sketch-to-image synthesis methods. The project code is available at https://github.com/hengliusky/Skecth2Photo.
•We propose a novel generator for sketch-to-image synthesis, which can capture global context information in the early layers of the network and maintain the global dependencies during reconstruction.•We present an improved spatial attention pooling (ISAP) module and integrate it into the proposed generator to construct a GAN-like framework, which is able to handle incomplete or missing sketches effectively.•We provide a large number of comparative experiments and ablation studies on different benchmark datasets to demonstrate the potential and superiority of the proposed methods thoroughly.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
4.
Fourier-Convolutional PaDiM for Anomaly Detection HAYASHI, Yoshikazu; AIZAWA, Hiroaki; NAKATSUKA, Shunsuke ...
Journal of the Japan Society for Precision Engineering,
2023/12/05, Volume:
89, Issue:
12
Journal Article
Open access
Anomaly detection aims to detect unusual patterns and samples in a training distribution. In this domain, many researchers have paid attention to anomaly detection models using ImageNet-pretrained ...weights. Among them, PaDiM is a promising approach that detects anomalies based on the feature distribution. While such approaches have achieved significant results, they tend to overlook global information due to the texture bias caused by ImageNet-pretrained convolutional models. Therefore, in this paper, we propose incorporating Fast Fourier Convolution, which can extract global information in the frequency domain, into PaDiM. This proposed model is named Fourier-Convolutional PaDiM (FC-PaDiM). Our FC-PaDiM is able to extract global features from frequency space and local features from feature space for more accurate anomaly detection. In our experiments, we demonstrated that our proposed FC-PaDiM allowed for extracting local and global features compared to PaDiM. Moreover, our additional analysis revealed the robustness of perturbations in frequency bands in the MVTecAD dataset.
Burst denoising aims to generate a clean image based on a sequence of noisy frames of the same scene captured in quick succession. However, relative motions inevitably happen between frames due to ...the movements of scenes or cameras, which would lead to blur and ghosting in the generated images. To address this issue, in this paper we propose a novel Efficient Burst Denoising Network (EBDNet) by integrating optical flow estimation with kernel prediction network in an end-to-end scenario. First, a lightweight Denoising Optical Flow Estimation (DOFE) module is presented for both burst feature and image alignment, which encourages to reduce the noise effect when making optical flow estimation. Building upon the aligned burst features and frames, a new fast Fourier convolution-enhanced kernel prediction module is introduced to merge the complementary information. It employs an encoder-decoder architecture with a well-designed feature enrichment block, which exploits the multi-level information from the encoder to boost the decoder features from both spatial and frequency domain views. Extensive experiments demonstrate that the proposed network achieves the best performance compared with state-of-the-art methods while maintaining reasonably low computing complexity.
Deep learning methods for fast MRI have shown promise in reconstructing high-quality images from undersampled multi-coil k-space data, leading to reduced scan duration. However, existing methods ...encounter challenges related to limited receptive fields in dual-domain (k-space and image domains) reconstruction networks, rigid data consistency operations, and suboptimal refinement structures, which collectively restrict overall reconstruction performance. This study introduces a comprehensive framework that addresses these challenges and enhances MR image reconstruction quality. Firstly, we propose Faster Inverse Fourier Convolution (FasterIFC), a frequency domain convolutional operator that significantly expands the receptive field of k-space domain reconstruction networks. Expanding the information extraction range to the entire frequency spectrum according to the spectral convolution theorem in Fourier theory enables the network to easily utilize richer redundant long-range information from adjacent, symmetrical, and diagonal locations of multi-coil k-space data. Secondly, we introduce a novel softer Data Consistency (softerDC) layer, which achieves an enhanced balance between data consistency and smoothness. This layer facilitates the implementation of diverse data consistency strategies across distinct frequency positions, addressing the inflexibility observed in current methods. Finally, we present the Dual-Domain Faster Fourier Convolution Based Network (D2F2), which features a centrosymmetric dual-domain parallel structure based on FasterIFC. This architecture optimally leverages dual-domain data characteristics while substantially expanding the receptive field in both domains. Coupled with the softerDC layer, D2F2 demonstrates superior performance on the NYU fastMRI dataset at multiple acceleration factors, surpassing state-of-the-art methods in both quantitative and qualitative evaluations.
•A novel frequency domain convolutional operator with global receptive field.•A softer data consistency layer is first proposed.•A novel centrosymmetric dual-domain parallel structure is introduced.•D2F2 outperforms current SOTA fast MRI methods.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
With the recent improvement of deep learning (DL) techniques and computer hardware capabilities, neural networks are widely used to monitor massive sensor data and detect earthquakes in them. This ...makes designing fast, accurate, and generalized DL models necessary for an active field of research for automatic seismic phase picking. A seismic phase picking network called MFFnet is proposed to fuse power spectral density (PSD), expert knowledge, spectrograms, recurrence plots (RPs), and Gramian angle fields. The network uses fast Fourier convolution (FFC) on 2-D representations to extract more interpretable features. Considering the high proportion of noisy signals in field applications, MFFnet uses focal loss (FL) as the loss function to improve network accuracy. Experimental results show that MFFnet achieves precision, recall, and accuracy with 0.96, 0.98, and 0.98, respectively, in seismic phase detection tasks. Shapley value is used to evaluate the relationship between features and network predictions. Compared with other DL networks, the feature extraction approach used in this letter is more explanatory and provides greater confidence in the results.
Medical image segmentation is crucial for accurately locating lesion regions and assisting doctors in diagnosis. However, most existing methods fail to effectively utilize both local details and ...global semantic information in medical image segmentation, resulting in the inability to effectively capture fine-grained content such as small targets and irregular boundaries. To address this issue, we propose a novel Pyramid Fourier Deformable Network (PFD-Net) for medical image segmentation, which leverages the strengths of CNN and Transformer. The PFD-Net first utilizes PVTv2-based Transformer as the primary encoder to capture global information and further enhances both local and global feature representations with the Fast Fourier Convolution Residual (FFCR) module. Moreover, PFD-Net further proposes the Dilated Deformable Refinement (DDR) module to enhance the model’s capacity to comprehend global semantic structures of shape-diverse targets and their irregular boundaries. Lastly, Cross-Level Fusion Block with deformable convolution (CLFB) is proposed to combine the decoded feature maps from the final Residual Decoder Block (DDR) with local features from the CNN auxiliary encoder branch, improving the network’s ability to perceive targets resembling the surrounding structures. Extensive experiments were conducted on nine publicly medical image datasets for five types of segmentation tasks including polyp, abdominal, cardiac, gland cells and nuclei. The qualitative and quantitative results demonstrate that PFD-Net outperforms existing state-of-the-art methods in various evaluation metrics, and achieves the highest performance of mDice with the value of 0.826 on the most challenging dataset (ETIS), which is 1.8% improvement compared to the previous best-performing HSNet and 3.6% improvement compared to the next-best PVT-CASCADE. Codes are available at https://github.com/ChaorongYang/PFD-Net.
•PFD-Net with PVTv2-based primary encoder and CNN-based auxiliary encoder is proposed for medical image segmentation.•FFCR module is proposed to enhance local and global features from PVTv2 encoder by spatial-frequency domain-combination.•DDR module is proposed to enrich objects’ semantic information in the feature maps from FFCR.•CLFB module is constructed to refine targets’ boundaries.•Achieves competitive performance on nine publicly datasets for five segmentation tasks.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
With the development of modern medical technology, medical image classification has played an important role in medical diagnosis and clinical practice. Medical image classification algorithms based ...on deep learning emerge in endlessly, and have achieved amazing results. However, most of these methods ignore the feature representation based on frequency domain, and only focus on spatial features. To solve this problem, we propose a hybrid domain feature learning (HDFL) module based on windowed fast Fourier convolution pyramid, which combines the global features with a wide range of receptive fields in frequency domain and the local features with multiple scales in spatial domain. In order to prevent frequency leakage, we construct a Windowed Fast Fourier Convolution (WFFC) structure based on Fast Fourier Convolution (FFC). In order to learn hybrid domain features, we combine ResNet, FPN, and attention mechanism to construct a hybrid domain feature learning module. In addition, a super-parametric optimization algorithm is constructed based on genetic algorithm for our classification model, so as to realize the automation of our super-parametric optimization. We evaluated the newly published medical image classification dataset MedMNIST, and the experimental results show that our method can effectively learning the hybrid domain feature information of frequency domain and spatial domain.
•A hybrid domain feature learning method based on windowed FFT convolution is proposed, which combines the global features in the frequency domain with the multi- scale local features in the spatial domain and achieves good results.•An interactive attention block (MIAM) is proposed to further focus on channel information, spatial information and its own interactive information. This method can improve the attention of the backbone network to the lesions area, thus improving the identification accuracy of the backbone network.•A super parameter search method based on improved genetic algorithm is proposed to realize the automation of super parameter optimization, which can greatly reduce the time cost of manual adjustment of super parameters.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP