대규모 멀티모달 언어 모델을 활용한 딥페이크 이미지 탐지 연구

Vol. 35, No. 2, pp. 299-311, 4월. 2025
10.13089/JKIISC.2025.35.2.299, Full Text:
Keywords: Deepfake detection, Multimodal Large Language Model, prompt engineering
Abstract

Recent advances in deepfake image generation technology have led to a surge in malicious applications. However, existing deepfake detection methods often require large datasets and prolonged training periods, struggle to adapt to new types of deepfake generation, and lack explainability. In this study, we propose a deepfake image detection approach that combines textual and visual cues using large-scale multimodal language models, GPT-4o and GPT-4o-mini. We further introduce various prompt engineering techniques—such as zero-shot, few-shot, multi-turn zero-shot, chain-of-thought, and self-consistency—to evaluate and compare the detection performance of these models. Experimental results show that while GPT-4o-mini initially achieved a maximum detection accuracy of only 55.4% under zero-shot settings, its performance improved significantly to over 90% by employing multi-turn zero-shot and self-consistency prompts. Meanwhile, GPT-4o achieved an AUC score exceeding 93.06% even without additional training and maintained over 90% accuracy on post-processed images, demonstrating robust performance. Moreover, by transparently presenting the model’s reasoning process, we confirm that this approach can enhance the reliability of detection results and contribute to addressing legal and ethical challenges.

Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
장현준, 최대선, 박성준, "Deepfake Image Detection via Multimodal Large Language Models," Journal of The Korea Institute of Information Security and Cryptology, vol. 35, no. 2, pp. 299-311, 2025. DOI: 10.13089/JKIISC.2025.35.2.299.

[ACM Style]
장현준, 최대선, and 박성준. 2025. Deepfake Image Detection via Multimodal Large Language Models. Journal of The Korea Institute of Information Security and Cryptology, 35, 2, (2025), 299-311. DOI: 10.13089/JKIISC.2025.35.2.299.