Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

被引：3

作者：

Chen, Bocheng ^{[1
]}

Wang, Guangjing ^{[1
]}

Guo, Hanqing ^{[1
]}

Wang, Yuanda ^{[1
]}

Yan, Qiben ^{[1
]}

机构：

[1] Michigan State Univ, E Lansing, MI 48824 USA

来源：

PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

Dialogue System; trustworthy machine learning; online toxicity; GENERATION;

D O I：

10.1145/3607199.3607237

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in natural language processing and machine learning have led to the development of chatbot models, such as ChatGPT, that can engage in conversational dialogue with human users. However, understanding the ability of these models to generate toxic or harmful responses during a non-toxic multi-turn conversation remains an open research problem. Existing research focuses on single-turn sentence testing, while we find that 82% of the individual non-toxic sentences that elicit toxic behaviors in a conversation are considered safe by existing tools. In this paper, we design a new attack, ToxicChat, by fine-tuning a chatbot to engage in conversation with a target open-domain chatbot. The chatbot is fine-tuned with a collection of crafted conversation sequences. Particularly, each conversation begins with a sentence from a crafted prompt sentences dataset. Our extensive evaluation shows that open-domain chatbot models can be triggered to generate toxic responses in a multi-turn conversation. In the best scenario, ToxicChat achieves a 67% toxicity activation rate. The conversation sequences in the fine-tuning stage help trigger the toxicity in a conversation, which allows the attack to bypass two defense methods. Our findings suggest that further research is needed to address chatbot toxicity in a dynamic interactive environment. The proposed ToxicChat can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue and improve the robustness of chatbots for end users.

引用

页码：282 / 296

页数：15

共 50 条

[1] Open-domain Multi-turn Dialogue Model Based on Knowledge Enhancement
Xu, Fan
Xu, Jian-Ming
Ma, Yong
Wang, Ming-Wen
Zhou, Guo-Dong
[J]. Ruan Jian Xue Bao/Journal of Software, 2024, 35 (02): : 758 - 772
[2] Improving Open-Domain Dialogue Systems via Multi-Turn Incomplete Utterance Restoration
Pan, Zhufeng
Bai, Kun
Wang, Yan
Zhou, Lianqiang
Liu, Xiaojiang
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1824 - 1833
[3] Knowledge-aware response selection with semantics underlying multi-turn open-domain conversations
Makoto Nakatsuji
Yuka Ozeki
Shuhei Tateishi
Yoshihisa Kano
QingPeng Zhang
[J]. World Wide Web, 2023, 26 : 3373 - 3388
[4] Knowledge-aware response selection with semantics underlying multi-turn open-domain conversations
Nakatsuji, Makoto
Ozeki, Yuka
Tateishi, Shuhei
Kano, Yoshihisa
Zhang, QingPeng
[J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 3373 - 3388
[5] MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Feng, Jiazhan
Sun, Qingfeng
Xu, Can
Zhao, Pu
Yang, Yaming
Tao, Chongyang
Zhao, Dongyan
Lin, Qingwei
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7348 - 7363
[6] Assessing Political Prudence of Open-domain Chatbots
Bang, Yejin
Lee, Nayeon
Ishii, Etsuko
Madotto, Andrea
Fung, Pascale
[J]. SIGDIAL 2021: 22ND ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2021), 2021, : 548 - 555
[7] Low-Resource Adaptation of Open-Domain Generative Chatbots
Gerhard-Young, Greyson
Anantha, Raviteja
Chappidi, Srinivas
Hoffmeister, Bjorn
[J]. PROCEEDINGS OF THE SECOND DIALDOC WORKSHOP ON DOCUMENT-GROUNDED DIALOGUE AND CONVERSATIONAL QUESTION ANSWERING (DIALDOC 2022), 2022, : 23 - 30
[8] A First Look at Toxicity Injection Attacks on Open-domain Chatbots
Weeks, Connor
Cheruvu, Aravind
Abdullah, Sifat Muhammad
Kanchi, Shravya
Yao, Danfeng
Viswanath, Bimal
[J]. 39TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2023, 2023, : 521 - 534
[9] Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network
Zhou, Xiangyang
Li, Lu
Dong, Daxiang
Liu, Yi
Chen, Ying
Zhao, Wayne Xin
Yu, Dianhai
Wu, Hua
[J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1118 - 1127
[10] Multi-Turn Response Selection for Chatbots With Hierarchical Aggregation Network of Multi-Representation
Mao, Guanwen
Su, Jindian
Yu, Shanshan
Luo, Da
[J]. IEEE ACCESS, 2019, 7 : 111736 - 111745

← 1 2 3 4 5 →