한국어 악성 프롬프트 주입 공격을 통한 거대 언어 모델의 유해 표현 유도

Vol. 34, No. 3, pp. 451-461, 6월. 2024
10.13089/JKIISC.2024.34.3.451, Full Text:
Keywords: Generative AI, Large Languge Model (LLM), Prompt Injection Attack, Harmful Speech
Abstract

Recently, various AI chatbots based on large language models have been released. Chatbots have the advantage of providing users with quick and easy information through interactive prompts, making them useful in various fields such as question answering, writing, and programming. However, a vulnerability in chatbots called "prompt injection attacks" has been proposed. This attack involves injecting instructions into the chatbot to violate predefined guidelines. Such attacks can be critical as they may lead to the leakage of confidential information within large language models or trigger other malicious activities. However, the vulnerability of Korean prompts has not been adequately validated. Therefore, in this paper, we aim to generate malicious Korean prompts and perform attacks on the popular chatbot to analyze their feasibility. To achieve this, we propose a system that automatically generates malicious Korean prompts by analyzing existing prompt injection attacks. Specifically, we focus on generating malicious prompts that induce harmful expressions from large language models and validate their effectiveness in practice.

Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
서지민 and 김진우, "Inducing Harmful Speech in Large Language Models through Korean Malicious Prompt Injection Attacks," Journal of The Korea Institute of Information Security and Cryptology, vol. 34, no. 3, pp. 451-461, 2024. DOI: 10.13089/JKIISC.2024.34.3.451.

[ACM Style]
서지민 and 김진우. 2024. Inducing Harmful Speech in Large Language Models through Korean Malicious Prompt Injection Attacks. Journal of The Korea Institute of Information Security and Cryptology, 34, 3, (2024), 451-461. DOI: 10.13089/JKIISC.2024.34.3.451.