DeepFake detection is pivotal in personal privacy and public safety. With the iterative advancement of DeepFake techniques, high-quality forged videos and images are becoming increasingly deceptive. ...Prior research has seen numerous attempts by scholars to incorporate biometric features into the field of DeepFake detection. However, traditional biometric-based approaches tend to segregate biometric features from general ones and freeze the biometric feature extractor. These approaches resulted in the exclusion of valuable general features, potentially leading to a performance decline and, consequently, a failure to fully exploit the potential of biometric information in assisting DeepFake detection. Moreover, insufficient attention has been dedicated to scrutinizing gaze authenticity within the realm of DeepFake detection in recent years. In this paper, we introduce GazeForensics, an innovative DeepFake detection method that utilizes gaze representation obtained from a 3D gaze estimation model to regularize the corresponding representation within our DeepFake detection model, while concurrently integrating general features to further enhance the performance of our model. Experimental results demonstrate that our proposed GazeForensics method performs admirably in terms of performance and exhibits excellent interpretability.
Advancements in generative artificial intelligence have made it easier to manipulate auditory and visual elements, highlighting the critical need for robust audio–visual deepfake detection methods. ...In this paper, we propose an articulatory representation-based audio–visual deepfake detection approach, ART-AVDF. First, we devise an audio encoder to extract articulatory features that capture the physical significance of articulation movement, integrating with a lip encoder to explore audio–visual articulatory correspondences in a self-supervised learning manner. Then, we design a multimodal joint fusion module to further explore inherent audio–visual consistency using the articulatory embeddings. Extensive experiments on the DFDC, FakeAVCeleb, and DefakeAVMiT datasets demonstrate that ART-AVDF obtains a significant performance improvement compared to many deepfake detection models.
Display omitted
•Articulatory representation learning, capturing physiological information in videos.•Audio-visual deepfake detection, using articulatory embedding for forgery detection.•Extensive experiments on three benchmarks and evaluations under various settings.
Although the recent advancement in generative models brings diverse advantages to society, it can also be abused with malicious purposes, such as fraud, defamation, and fake news. To prevent such ...cases, vigorous research is conducted to distinguish the generated images from the real images, but challenges still remain to distinguish the generated images outside of the training settings. Such limitations occur due to data dependency arising from the model’s overfitting issue to the specific Generative Adversarial Networks (GANs) and categories of the training data. To overcome this issue, we adopt a self-supervised scheme. Our method is composed of the artificial artifact generator reconstructing the high-quality artificial artifacts of GAN images, and the GAN detector distinguishing GAN images by learning the reconstructed artificial artifacts. To improve the generalization of the artificial artifact generator, we build multiple autoencoders with different numbers of upconvolution layers. With numerous ablation studies, the robust generalization of our method is validated by outperforming the generalization of the previous state-of-the-art algorithms, even without utilizing the GAN images of the training dataset.
Display omitted
•A novel framework to train a GAN detector in the self-supervision scheme.•New architecture employing multiple autoencoders to reproduce the fingerprints of GANs.•Outstanding robustness to unknown GANs compared to the supervised GAN detectors.•Impressive performance for zero-shot and few-shot transfer learning.•Detailed analysis of the proposed framework with numerous ablation tests.
Deep learning has enabled realistic face manipulation for malicious purposes (e.g., deepfakes), which poses significant concerns over the integrity of the media in circulation. Most existing deep ...learning techniques for deepfake detection can achieve promising performance in the intra-dataset evaluation setting, but are unable to perform satisfactorily in the inter-dataset evaluation setting. Most previous methods use a backbone network to extract global features for making predictions and only employ binary supervision to train the network. Classification merely based on the learning of global features often leads to weak generalizability to deepfakes of unseen manipulation methods. In this paper, we design a two-branch Convolutional AutoEncoder (CAE), which considers the reconstruction and classification tasks simultaneously for deepfake detection. This Joint Reconstruction and Classification (JRC) method shares the information learned by one task with the other, each focusing on different aspects, and hence boosts the overall performance. JRC is end-to-end, and experiments demonstrate that it achieves state-of-the-art performance on three commonly-used datasets, particularly in the cross-dataset evaluation setting.
Recent advances in Generative Artificial Intelligence (AI) have increased the possibility of generating hyper-realistic DeepFake videos or images to cause serious harm to vulnerable children, ...individuals, and society at large with misinformation. To overcome this serious problem, many researchers have attempted to detect DeepFakes using advanced machine learning techniques and advanced fusion techniques. This paper presents a detailed review of past and present DeepFake detection methods with a particular focus on media-modality fusion and machine learning. This paper also provides detailed information on available benchmark datasets in DeepFake detection research. This review paper addressed the 67 primary papers that were published between 2015 and 2023 in DeepFake detection, including 55 research papers in image and video DeepFake detection methodologies and 15 research papers on identifying and verifying speaker authentication. This paper offers lucrative information on DeepFake detection research and offers a unique review analysis of advanced machine learning and modality fusion that sets it apart from other review papers. This paper further offers informed guidelines for future work in DeepFake detection utilizing advanced state-of-the-art machine learning and information fusion models that should support further advancement in DeepFake detection for a sustainable and safer digital future.
Deepfake has been exploited in recent years despite its widespread usage in a variety of areas to create dangerous material such as fake movies, rumors, and false news by changing or substituting the ...face information of the sources and so poses enormous security concerns to society. Research on active detection & prevention technologies is critical as deepfake continues to evolve. Deepfake has been a blessing, but we've taken advantage of it by utilizing it to swap faces. Deepfake is a new subdomain of Artificial Intelligence (AI) technology in which one person's face is layered over another person's face, which is becoming more and more popular on social networking sites. Deepfake pictures and videos can now be created much more quickly and cheaply due to ML (Machine Learning), which is a primary component of deepfakes. Despite negative connotations attached to the term "deepfakes," technology is increasingly being used in commercial & individual contexts. New technical advancements have made it more difficult to distinguish between deepfakes and images that have been digitally manipulated. The rise of deepfake technologies has sparked a growing sense of unease. The primary goal of this project is to properly distinguish deepfake pictures from real images using deep learning techniques.
In this study, we implemented a customized CNN algorithm to identify deepfake pictures from a video dataset and conducted a comparative analysis with two other methods to determine which way was superior. The Kaggle dataset was used to train & test our model. Convolutional neural networks (CNNs) have been used in this research to distinguish authentic & deepfake images by training three distinct CNN models. A customized CNN model, which includes several additional layers such as a dense layer, MaxPooling, as well as a dropout layer, has also been developed and implemented. This method follows the frames extraction, face feature extraction, data preprocessing, and classification phases in determining whether Real or Fake images in the video reflect the objectives. Accuracy, loss, and the area under the receiver operating characteristic (ROC) curve were used to characterize the data. Customized CNN outperformed all other models, achieving 91.4% accuracy, a reduced loss value of 0.342, as well as an AUC of 0.92. Besides, we obtained 85.2% testing accuracy from the CNN and 95.5% testing accuracy from the MLP-CNN model.
Exposing Deep Fakes Using Inconsistent Head Poses Yang, Xin; Li, Yuezun; Lyu, Siwei
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Conference Proceeding
In this paper, we propose a new method to expose AI-generated fake face images or videos (commonly known as the Deep Fakes). Our method is based on the observations that Deep Fakes are created by ...splicing synthesized face region into the original image, and in doing so, introducing errors that can be revealed when 3D head poses are estimated from the face images. We perform experiments to demonstrate this phenomenon and further develop a classification method based on this cue. Using features based on this cue, an SVM classifier is evaluated using a set of real face images and Deep Fakes.
The existing face forgery algorithms have achieved remarkable progress in how to generate reasonable facial images and can even successfully deceive human beings. Considering public security, face ...forgery detection is of vital importance, making it essential to design face forgery detection algorithms to detect forgery images over the Internet. Despite the great success achieved by the existing Deepfake detection algorithms, they usually failed to achieve satisfactory Deepfake detection performance when deployed to handle the forgery videos in practice. One significant reason is compression. The videos over the Internet are inevitably compressed considering the transmission efficiency. The video compression results in significant Deepfake detection performance degradation for the existing Deepfake detection algorithms. To address this issue, in this article, we propose a generic, simple yet effective "bleaching" pre-processing module based on the generative model and the high-level feature representations to produce a bleached image, which shares a similar appearance with the compressed images. The bleached images with recovered information can be identified accurately by the optimized Deepfake detection models without retraining. The proposed method has utilized a redesigned feature representation, which serves as a navigator to effectively and sufficiently alter the feature distribution in the high-dimensional space to remedy the difference between real facial images and forgery counterparts. Thus, the proposed method can successfully avoid misclassification. Comprehensive and extensive experiments are carried out on four low-quality Faceforensics++ datasets, demonstrating the effectiveness of our method in recovering the information loss caused by the compression artifacts across various backbones and compression.
Fake face detection is in dilemma with the rapid development of face manipulation technology. One way to improve the effectiveness of detector is to make full use of intra and inter frame ...information. In this paper, a novel Xception-LSTM algorithm is proposed by using our new spatiotemporal attention mechanism and convolutional long short-term memory (ConvLSTM). In the algorithm, the spatiotemporal attention mechanism, including spatial and temporal attention mechanism, is proposed to capture and enhance spatiotemporal correlations before dimension reduction of Xception. Thereafter, the ConvLSTM is introduced to consider frame structure information while modeling temporal information. The experimental results on three widely used datasets demonstrate that the proposed algorithms perform better than the state-of-the-art algorithms. In addition, the effectiveness of the spatiotemporal attention mechanism and ConvLSTM are illustrated in ablation experiments.