The high-profile paper contains manipulated images, senior author acknowledges
The high-profile paper contains manipulated images, senior author acknowledges
High-quality, diverse, and photorealistic images can now be generated by unconditional GANs (e.g., StyleGAN). However, limited options exist to control the generation process using (semantic) ...attributes while stillpreserving the quality of the output. Further, due to the entangled nature of the GAN latent space, performing edits along one attribute can easily result in unwanted changes along other attributes. In this article, in the context of conditional exploration of entangled latent spaces, we investigate the two sub-problems of attribute-conditioned sampling and attribute-controlled editing. We present StyleFlow as a simple, effective, and robust solution to both the sub-problems by formulating conditional exploration as an instance of conditional continuous normalizing flows in the GAN latent space conditioned by attribute features. We evaluate our method using the face and the car latent space of StyleGAN, and demonstrate fine-grained disentangled edits along various attributes on both real photographs and StyleGAN generated images. For example, for faces, we vary camera pose, illumination variation, expression, facial hair, gender, and age. Finally, via extensive qualitative and quantitative comparisons, we demonstrate the superiority of StyleFlow over prior and several concurrent works. Project Page and Video: https://rameenabdal.github.io/StyleFlow.
To defend against manipulation of image content, such as splicing, copy-move, and removal, we develop a Progressive Spatio-Channel Correlation Network (PSCC-Net) to detect and localize image ...manipulations. PSCC-Net processes the image in a two-path procedure: a top-down path that extracts local and global features and a bottom-up path that detects whether the input image is manipulated, and estimates its manipulation masks at multiple scales, where each mask is conditioned on the previous one. Different from the conventional encoder-decoder and no-pooling structures, PSCC-Net leverages features at different scales with dense cross-connections to produce manipulation masks in a coarse-to-fine fashion. Moreover, a Spatio-Channel Correlation Module (SCCM) captures both spatial and channel-wise correlations in the bottom-up path, which endows features with holistic cues, enabling the network to cope with a wide range of manipulation attacks. Thanks to the light-weight backbone and progressive mechanism, PSCC-Net can process <inline-formula> <tex-math notation="LaTeX">1,080\text{P} </tex-math></inline-formula> images at 50+FPS. Extensive experiments demonstrate the superiority of PSCC-Net over the state-of-the-art methods on both detection and localization. Codes and models are available at https://github.com/proteus1991/PSCC-Net .
Deep models have been widely and successfully used in image manipulation detection, which aims to classify tampered images and localize tampered regions. Most existing methods mainly focus on ...extracting global features from tampered images, while neglecting the relationships of local features between tampered and authentic regions within a single tampered image. To exploit such spatial relationships, we propose Proposal Contrastive Learning (PCL) for effective image manipulation detection. Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively. To further improve the discriminative power, we exploit the relationships of local features through a proxy proposal contrastive learning task by attracting/repelling proposal-based positive/negative sample pairs. Moreover, we show that our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features. Extensive experiments among several standard datasets demonstrate that our PCL can be a general module to obtain consistent improvement. The code is available at https://github.com/Sandy-Zeng/PCL .
In this study, we revisit the fundamental setting of face-swapping models and reveal that only using implicit supervision for training leads to the difficulty of advanced methods to preserve the ...source identity. We propose a novel reverse pseudo-input generation approach to offer supplemental data for training face-swapping models, which addresses the aforementioned issue. Unlike the traditional pseudo-label-based training strategy, we assume that arbitrary real facial images could serve as the ground-truth outputs for the face-swapping network and try to generate corresponding input < source, target > pair data. Specifically, we involve a source-creating surrogate that alters the attributes of the real image while keeping the identity, and a target-creating surrogate intends to synthesize attribute-preserved target images with different identities. Our framework, which utilizes proxy-paired data as explicit supervision to direct the face-swapping training process, partially fulfills a credible and effective optimization direction to boost the identity-preserving capability. We design explicit and implicit adaption strategies to better approximate the explicit supervision for face swapping. Quantitative and qualitative experiments on FF++, FFHQ, and wild images show that our framework could improve the performance of various face-swapping pipelines in terms of visual fidelity and ID preserving. Furthermore, we display applications with our method on re-aging, swappable attribute customization, cross-domain, and video face swapping. Code is available under https://github.com/ICTMCG/CSCS.
Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main ...reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa is based on i) a new inpainting network architecture that uses fast Fourier convolutions (FFCs), which have the image-wide receptive field; ii) a high receptive field perceptual loss; iii) large training masks, which unlocks the potential of the first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g. completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than those seen at train time, and achieves this at lower parameter&time costs than the competitive baselines. The code is available at https://github.com/saic-mdal/lama.
PIE Tewari, Ayush; Elgharib, Mohamed; R, Mallikarjun B ...
ACM transactions on graphics,
11/2020, Volume:
39, Issue:
6
Journal Article
Peer reviewed
Open access
Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful ...parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study.
Existing 3D-aware facial generation methods face a dilemma in quality versus editability: they either generate editable results in low resolution, or high-quality ones with no editing flexibility. In ...this work, we propose a new approach that brings the best of both worlds together. Our system consists of three major components: (1) a 3D-semantics-aware generative model that produces view-consistent, disentangled face images and semantic masks; (2) a hybrid GAN inversion approach that initializes the latent codes from the semantic and texture encoder, and further optimizes them for faithful reconstruction; and (3) a canonical editor that enables efficient manipulation of semantic masks in canonical view and produces high-quality editing results. Our approach is competent for many applications, e.g. free-view face drawing, editing and style control. Both quantitative and qualitative results show that our method reaches the state-of-the-art in terms of photorealism, faithfulness and efficiency.
Current deep learning‐based image manipulation localization methods achieve impressive performance when rich spatial features and information are fully utilized. However, most of them suffer from the ...irrelevance of semantic awareness when identifying various manipulation categories. This leads to false alarms on recognizing forged regions. In this paper, we propose a Progressively‐Refined Neural Network (PR‐Net), to localize the tampered regions progressively under a coarse‐to‐fine workflow. Specifically, PR‐Net is composed of a Feature Extractor (FE) that captures feature intrinsic correlations and a Mask Generation Module (MGM) with three refining generators. The FE takes a CNN to extract the image features and introduces an attention mechanism Convolution Block Attention Module (CBAM) to suppress the image content and guide the extractor in exploring the inconsistencies between the manipulated and authentic regions. The MGM comprises three generators where the Coarse Mask RR‐Generator generates a localization result roughly, the Candidate Mask RR‐Generator generates a possible tampered region according to the rough localization measure, and the Fine Mask RR‐Generator produces the final prediction of manipulated regions. We also utilize the Rotated Residual (RR) structure to suppress the image content during the generative process. The extensive experimental results on four benchmark data sets (NIST16, COVER, CASIA v1.0, and In‐The‐Wild) demonstrate the superior performance of PR‐Net compared with the state‐of‐the‐art methods in localizing the manipulated regions.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
Image manipulation localization is a binary segmentation task that sensitive to the tampered artifacts other than awareness of the object. Thus, both traditional and learning-based methods highly ...rely on hand-crafted features. However, these specifically-defined features limit the ability of the network for general scenes. To tackle this problem, we propose a dual homology-aware generative adversarial network (DH-GAN), a novel GAN-based framework to localize the manipulated region. Firstly, we localize the forgery region via re-calibrating the multi-scale encoded features with a selective pyramid generator. Then, we perform the homology identification in the discriminator. The proposed homology-aware discriminators contain a stack of masked convolution (MConv) layers and learn to identify the real/fake of the segmented pixels on the predicted/target masked image in a hard-gating manner. Overall, the networks are optimized under a standard GAN. Experiments show that the proposed method outperforms other state-of-the-art algorithms on four popular image manipulation datasets.
•Avoiding the use of hand-crafted features for image manipulation localization.•Learning generalized features by identifying the homology of the pixels.•Re-calibrating multi-scale features with cross-scale information interaction.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP