A deepfake is a video, audio or other content (e.g. image) that is completely or partially fabricated or created by manipulating existing, real content. Just as fake news calls into question the ...authenticity of real news, deepfake also calls into question the authenticity of real content. At the same time, deepfake has many advantages in addition to its often mentioned dangers. Following a historical overview of deepfake, the study describes these benefits and dangers, and then discusses possible legal responses after presenting tools for detecting deepfake.
A deepfake olyan videó, hang vagy egyéb tartalom (például kép), amely teljesen vagy részben kitalált, vagy egy meglévő, valós tartalom manipulálásával jött létre. Ahogyan az álhírek (fake news) megkérdőjelezik a valós hírek hitelességét, a mélyhamisítás (deepfake) is megkérdőjelezi a valós tartalmak valódiságát. Ugyanakkor a deepfake-nek a sokszor hangoztatott veszélyei mellett számos előnye is van. A tanulmány a deepfake történeti áttekintését követően ezeket az előnyöket és veszélyeket ismerteti, majd a deepfake észlelésére szolgáló eszközök bemutatását követően kitér a lehetséges jogi válaszlépésekre.
Advancements in computer vision and deep learning have led to difficulty in distinguishing Deepfake and real videos. In particular, forgery audios are also generated to accompany fake videos and make ...them more realistic, which makes Deepfake detection more difficult. Existing Deepfake detection methods that use multimodal information ignore the representation gap between different modalities, resulting in limited performance. To address this problem, in this paper, a novel Deepfake detection method utilizing multimodal contrastive learning (MCL) is proposed to better explore intra-modal and cross-modal forgery clues. To reduce the cross-modal gap and explore multimodal forgery artifacts, a cross-modal contrastive learning strategy is designed to learn a compositional embedding from multimodal information, which facilitates pulling together representations across uni-modalities and multi-modalities. Moreover, to supplement the intra-frame forgery clues mining ability of the video network, the frame knowledge is distilled to the video network without adding additional computation. Specifically, to mine intra-modal clues, three modality features are first extracted from audio, frame and video, respectively. Secondly, the audio and frame features are separately composed with the video feature to derive two cross-modal representations. Subsequently, these cross-modal features are contrastive with the intra-modal features to reduce cross-modal gap. By jointly pulling together the unimodal and multimodal features through MCL, a more effective representation that contains intra-modal and cross-modal forgery artifacts can be learned. Finally, a noise-based feature augmentation (NFA) module is proposed to adaptively perturb the audio-visual feature and further improve generalization performance. Extensive experiments demonstrate that the proposed framework outperforms SOTA methods.
Deepfake videos are becoming more pervasive. In this preregistered online experiment, participants (N = 454, Mage = 37.19, SDage = 13.25, males = 57.5%) categorize a series of 20 videos as either ...real or deepfake. All participants saw 10 real and 10 deepfake videos. Participants were randomly assigned to receive a list of strategies for detecting deepfakes based on visual cues (e.g., looking for common artifacts such as skin smoothness) or to act as a control group. Participants were also asked how confident they were that they categorized each video correctly (per video confidence) and to estimate how many videos they correctly categorized out of 20 (overall confidence). The sample performed above chance on the detection activity, correctly categorizing 60.70% of videos on average (SD = 13.00). The detection strategies intervention did not impact detection accuracy or detection confidence, with the intervention and control groups performing similarly on the detection activity and showing similar levels of confidence. Inconsistent with previous research, the study did not find that participants had a bias toward categorizing videos as real. Participants overestimated their ability to detect deepfakes at the individual video level. However, they tended to underestimate their abilities on the overall confidence question.
•Participants categorized 20 videos as real or deepfake (half were deepfake).•Mean categorization accuracy was 60.7%.•Providing detection strategies did not impact participant accuracy or confidence.•Participants were overconfident at the individual video level.•Participants were underconfident when rating their overall performance.
Although the recent advancement in generative models brings diverse advantages to society, it can also be abused with malicious purposes, such as fraud, defamation, and fake news. To prevent such ...cases, vigorous research is conducted to distinguish the generated images from the real images, but challenges still remain to distinguish the generated images outside of the training settings. Such limitations occur due to data dependency arising from the model’s overfitting issue to the specific Generative Adversarial Networks (GANs) and categories of the training data. To overcome this issue, we adopt a self-supervised scheme. Our method is composed of the artificial artifact generator reconstructing the high-quality artificial artifacts of GAN images, and the GAN detector distinguishing GAN images by learning the reconstructed artificial artifacts. To improve the generalization of the artificial artifact generator, we build multiple autoencoders with different numbers of upconvolution layers. With numerous ablation studies, the robust generalization of our method is validated by outperforming the generalization of the previous state-of-the-art algorithms, even without utilizing the GAN images of the training dataset.
Display omitted
•A novel framework to train a GAN detector in the self-supervision scheme.•New architecture employing multiple autoencoders to reproduce the fingerprints of GANs.•Outstanding robustness to unknown GANs compared to the supervised GAN detectors.•Impressive performance for zero-shot and few-shot transfer learning.•Detailed analysis of the proposed framework with numerous ablation tests.
With the rapid progress of deepfake technology, the improper use of manipulated images and videos presenting synthetic faces has arisen as a noteworthy concern, thereby posing threats to both daily ...life and national security. While numerous CNN based deepfake face detection methods were proposed, most of the existing approaches encounter challenges in effectively capturing the image contents across different scales and positions. In this paper, we present a novel two-branch structural network, referred to as the Self-Attention Deepfake Face Discrimination Network (SADFFD). Specifically, a branch incorporating cascaded multi self-attention mechanism (SAM) modules, is parallelly integrated with EfficientNet-B4 (EffB4). The multi SAM branch supplies additional features that concentrate on image regions essential for discriminating between real and fake. The EffB4 network is adopted because of its efficiency by adjusting the resolution, depth, and width of the network. According to our comprehensive experiments conducted on FaceForensics++, Celeb-DF, and our self-constructed SAMGAN3 datasets, the proposed SADFFD demonstrated the highest detection accuracy, averaging 99.01% in FaceForensics++, 98.65% in Celeb-DF, and an impressive 99.99% in SAMGAN3, surpassing other state-of-the-art (SOTA) methods.
•A novel two-branch CNN structure is proposed for deepfake face discrimination.•The self-attention mechanism is utilized to enhance the accuracy of discrimination.•FaceForensics++, Celeb-DF and our self-built dataset are used in evaluation in terms of detection accuracy.•Forged face images/videos from various generating methods are included in our evaluation datasets.•Comprehensive experiments demonstrate the superior performance of our proposed method in discriminating deepfake face.
Deep learning has enabled realistic face manipulation for malicious purposes (e.g., deepfakes), which poses significant concerns over the integrity of the media in circulation. Most existing deep ...learning techniques for deepfake detection can achieve promising performance in the intra-dataset evaluation setting, but are unable to perform satisfactorily in the inter-dataset evaluation setting. Most previous methods use a backbone network to extract global features for making predictions and only employ binary supervision to train the network. Classification merely based on the learning of global features often leads to weak generalizability to deepfakes of unseen manipulation methods. In this paper, we design a two-branch Convolutional AutoEncoder (CAE), which considers the reconstruction and classification tasks simultaneously for deepfake detection. This Joint Reconstruction and Classification (JRC) method shares the information learned by one task with the other, each focusing on different aspects, and hence boosts the overall performance. JRC is end-to-end, and experiments demonstrate that it achieves state-of-the-art performance on three commonly-used datasets, particularly in the cross-dataset evaluation setting.
Recent advances in generative models for language have enabled the creation of convincing synthetic text or deepfake text. Prior work has demonstrated the potential for misuse of deepfake text to ...mislead content consumers. Therefore, deepfake text detection, the task of discriminating between human and machine-generated text, is becoming increasingly critical. Several defenses have been proposed for deepfake text detection. However, we lack a thorough understanding of their real-world applicability. In this paper, we collect deepfake text from 4 online services powered by Transformer-based tools to evaluate the generalization ability of the defenses on content in the wild. We develop several low-cost adversarial attacks, and investigate the robustness of existing defenses against an adaptive attacker. We find that many defenses show significant degradation in performance under our evaluation scenarios compared to their original claimed performance. Our evaluation shows that tapping into the semantic information in the text content is a promising approach for improving the robustness and generalization performance of deepfake text detection schemes.