Large-scale text-to-video diffusion models have shown outstanding capabilities. However, their direct application to video stylization is hindered by the limited availability of text-to-video ...datasets and computational resources. Moreover, meeting content preservation standards for style transfer tasks is challenging due to the stochastic and destructive nature of the noise addition process. This letter introduces a succinct video stylization approach, named Style-A-Video, which leverages a generative pre-trained transformer and an image latent diffusion model for text-controlled video stylization. We improve the guidance conditions in the denoising process to maintain a balance between artistic expression and structural preservation. Additionally, by integrating sampling optimization and temporal consistency modules, we address inter-frame flickering and prevent additional artifacts. Comprehensive experimental results demonstrate superior content preservation and stylistic performance while minimizing resource consumption.
Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, ...unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining (CLIP) model and a salient object detection network. Masks obtained from the salient object detection network are utilized to guide the style transfer process, and various strategies are employed to optimize according to different masks. Adequate experiments with diverse content images and style text descriptions were conducted, demonstrating our method's advantages: the network is easily trainable and converges rapidly; it achieves stable, superior generation results compared to other methods. Our approach addresses over-stylization issues in the foreground, enhances foreground-background contrast, and enables precise control over style transfer in various semantic regions.
•A Semantic-Aware and Salient Attention CLIPStyler is proposed for solving the task of text-based style transfer.•The method introduces U2-Net as the salient object detection network to realize the different degrees' stylization.•Semantic-aware PatchCLIP loss is proposed to solve the problem of poor output image effect.•Global background loss function and a mask are proposed to ensure that the background and foreground are not distorted.
Underexposure regions are vital in constructing a complete perception of the surrounding environment for safe autonomous driving. The availability of thermal cameras has provided an essential ...alternative to explore regions where other optical sensors lack in capturing interpretable signals. A thermal camera captures an image using the heat difference emitted by objects in the infrared spectrum, and object detection in thermal images becomes effective for autonomous driving in challenging conditions. Although object detection in the visible spectrum domain has matured, thermal object detection lacks effectiveness. A significant challenge is the scarcity of labeled data for the thermal domain, which is essential for SOTA artificial intelligence techniques. This work proposes a domain adaptation framework that employs a style transfer technique for transfer learning from visible spectrum images to thermal images. The framework uses a generative adversarial network (GAN) to transfer the low-level features from the visible spectrum domain to the thermal domain through style consistency. The efficacy of the proposed object detection method in thermal images is evident from the improved results when using styled images from publicly available thermal image datasets (FLIR ADAS and KAIST Multi-Spectral).
•Fusion of thermal and RGB domain at data level for object detection.•Cross-domain model transfer approach acts as a pseudo-labeler for unlabeled dataset.•Source’s low-level features transfer to target domain using GAN.•Proposed thermal object detection improves autonomous vehicle’s perception at night.
Although arbitrary style transfer has been a hot topic in computer vision, most existing methods that directly align style and content features frequently result in unnatural effects in the generated ...images, such as over-stylization and style leakage. In this paper, we introduce a novel progressive Intrinsic-style Distribution Matching (ISDM) approach which initially aligns the intrinsic style distribution of both style and content images and then integrates it with the image’s content component. This novel approach can effectively alleviate the issue of over-stylization while preserving the content structure, particularly in the case of high contrast and vivid colors. To further enhance the performance of our method, we propose a learnable multi-level style modulation module to assist the network in aligning the intrinsic style distributions. Moreover, two contrastive objectives are proposed to improve the ability of the encoder to extract more distinct and representative intrinsic content and intrinsic style features. Extensive experimental results showcase that our approach can preserve more content details than state-of-the-art methods. It can also generate more natural images, especially when the style image has high contrast and vivid colors.
The artistic style transfer of images aims to synthesise novel images by combining the content of one image with the style of another, which is a long-standing research topic and already has been ...widely applied in real world. However, defining the aesthetic perception from the human visual system is a challenging problem. In this study, the authors propose a novel method for automatic visual perception style transfer. First, they render a novel saliency detection algorithm to automatically perceive the visual attention of an image. Then, different from conventional style transfer algorithm in which style transferring is applied uniformly across all image regions, the authors apply a saliency algorithm to guide the style transferring process, enabling different types of style transferring to occur in different regions. Extensive experiments show that the proposed saliency detection algorithm and the style transfer algorithm are superior in performance and efficiency.
Despite the recent rapid development of neural style transfer, existing style transfer methods are still somewhat inefficient or have a large model size, which limits their application on ...computational resource limited devices. The major problem lies in that they usually adopt a pre-trained VGG-19 backbone which is relatively large or the feature transformation module is computationally heavy. To address above problems, we propose a DIstillation based Style Transfer framework (called DIST) in conjunction with an efficient feature transformation module for arbitrary image and video style transfer. The distillation module can lead to a highly compressed backbone network, which is 15.95× smaller than the VGG-19 based backbone. The proposed feature transformation is capable of transforming the content features in an extremely efficient feed forward pass. For video style transfer, the above framework is further combined with a temporal consistency regularization loss. Extensive experiments show that the proposed method is superior over the state-of-the-art image and video style transfer methods, even with a much smaller model size.
•A knowledge distillation method to compress VGG-19 based backbone is proposed.•A light weight feature transformation module for flexible style transfer is proposed.•A temporal consistency loss to maintain video style transfer stability is proposed.•A current smallest style transfer model is derived, only 2.67 MB.•The final model can perform style transfer at 167 FPS on a 2080Ti GPU.
Neural Style Transfer: A Review Jing, Yongcheng; Yang, Yezhou; Feng, Zunlei ...
IEEE transactions on visualization and computer graphics,
2020-Nov.-1, 2020-11-00, 2020-11-1, 20201101, Volume:
26, Issue:
11
Journal Article
Peer reviewed
Open access
The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of ...using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). Since then, NST has become a trending topic both in academic literature and industrial applications. It is receiving increasing attention and a variety of approaches are proposed to either improve or extend the original NST algorithm. In this paper, we aim to provide a comprehensive overview of the current progress towards NST. We first propose a taxonomy of current algorithms in the field of NST. Then, we present several evaluation methods and compare different NST algorithms both qualitatively and quantitatively. The review concludes with a discussion of various applications of NST and open problems for future research. A list of papers discussed in this review, corresponding codes, pre-trained models and more comparison results are publicly available at: https://osf.io/f8tu4/ .
Recent studies have made tremendous progress in neural style transfer (NST) and various methods have been advanced. However, evaluating and improving the stylization quality remain two important open ...challenges. Committed to these two aspects, in this paper, we first decompose the quality of style transfer into three quantifiable factors, i.e., the content fidelity (CF), global effects (GE) and local patterns (LP). Then, two novel approaches are further presented for exploiting these factors to improve the stylization quality. The first, named cascade style transfer (CST), utilizes the factors to guide the cascade combination of existing NST methods to absorb their merits and avoid their own shortcomings. The second, dubbed multi-objective network (MO-Net), directly optimizes these factors to balance their performance and achieves more harmonious stylized results. Extensive experiments demonstrate the effectiveness and superiority of our proposed factors and methods.
•Three quantifiable factors to evaluate the style transfer quality.•Cascade style transfer under the guidance of our factors to improve the quality.•A multi-objective network directly optimizes our factors to improve the quality.