효율적인 딥페이크 이미지 탐지를 위한 CNN 및 ViT 통합 모델 개발

Vol. 35, No. 3, pp. 513-526, 6월. 2025
10.13089/JKIISC.2025.35.3.513, Full Text:
Keywords: Deepfake detection, Generative Adversarial Network, Convolutional Neural Network, Vision Transformer
Abstract

Recently, as deepfake technology has become more advanced, the number of crimes exploiting it has increased, and the importance of research on deepfake detection technology has grown. In this paper, we used GAN (Generative Adversarial Network) and Autoencoder as technologies for generating deepfake images, and proposed a CNN-ViT-based integrated model that combines CNN(Convolutional Neural Networks) and ViT(Vision Transformer) to improve deepfake detection performance, The reason for this approach is that CNN-based detection methods effectively learn local image features and can detect subtle distortions commonly found in deepfake images. On the other hand, ViT has the advantage of capturing global pattern relationships by dividing the image into patches and learning its overall structure. To evaluate the performance of the CNN-ViT model, we performed comparative experiments by combining CNN architectures with ViT from EfficientNet-B0 to B4. In addition, we compared the performance according to the depth of the model with Precision, Recall, and F1-score, and analyzed the effects on the detection performance according to changes in weight decay, batch size, and learning rate. The proposed CNN-ViT model achieved detection rates of 82.75% and 60.84% on the WildDeepfake and FaceForensics++ datasets, respectively, demonstrating superior performance compared to existing models.

Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
김수형, 하재철, 이민종, "A Combined Model of CNN and ViT for Deepfake Image Detection," Journal of The Korea Institute of Information Security and Cryptology, vol. 35, no. 3, pp. 513-526, 2025. DOI: 10.13089/JKIISC.2025.35.3.513.

[ACM Style]
김수형, 하재철, and 이민종. 2025. A Combined Model of CNN and ViT for Deepfake Image Detection. Journal of The Korea Institute of Information Security and Cryptology, 35, 3, (2025), 513-526. DOI: 10.13089/JKIISC.2025.35.3.513.