Subtoxic Questions: Dive Into Attitude Change of LLM's Response in Jailbreak Attempts,arXiv - CS - Cryptography and Security

当前位置： X-MOL 学术 › arXiv.cs.CR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Subtoxic Questions: Dive Into Attitude Change of LLM's Response in Jailbreak Attempts
arXiv - CS - Cryptography and Security Pub Date : 2024-04-12 , DOI: arxiv-2404.08309
Tianyu Zhang, Zixuan Zhao, Jiaqi Huang, Jingyu Hua, Sheng Zhong

As Large Language Models (LLMs) of Prompt Jailbreaking are getting more and more attention, it is of great significance to raise a generalized research paradigm to evaluate attack strengths and a basic model to conduct subtler experiments. In this paper, we propose a novel approach by focusing on a set of target questions that are inherently more sensitive to jailbreak prompts, aiming to circumvent the limitations posed by enhanced LLM security. Through designing and analyzing these sensitive questions, this paper reveals a more effective method of identifying vulnerabilities in LLMs, thereby contributing to the advancement of LLM security. This research not only challenges existing jailbreaking methodologies but also fortifies LLMs against potential exploits.

中文翻译：

有毒的问题：深入探讨法学硕士在越狱尝试中的态度变化

随着快速越狱的大型语言模型（LLM）越来越受到关注，提出一个通用的研究范式来评估攻击强度和一个基础模型来进行更精细的实验具有重要意义。在本文中，我们提出了一种新颖的方法，重点关注一组本质上对越狱提示更加敏感的目标问题，旨在规避增强的 LLM 安全性带来的限制。通过设计和分析这些敏感问题，本文揭示了一种更有效的识别法学硕士漏洞的方法，从而为法学硕士安全性的进步做出贡献。这项研究不仅挑战了现有的越狱方法，而且还增强了法学硕士抵御潜在漏洞的能力。

更新日期：2024-04-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>