Chinese Generation and Security Index Evaluation Based on Large Language Model

被引：0

作者：

Zhang, Yu ^{[1
]}

Gao, Yongbing ^{[1
]}

Li, Weihao ^{[1
]}

Su, Zirong ^{[1
]}

Yang, Lidong ^{[1
]}

机构：

[1] Inner Mongolia Univ Sci & Technol, Sch Numer Ind, Baotou, Inner Mongolia, Peoples R China

来源：

2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Safety Assessment; Chinese Generation; AI Hallucination; Automatic Scoring; Large Language Model;

D O I：

10.1109/IALP63756.2024.10661189

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study investigates the performance and security indicators of mainstream large language models in Chinese generation tasks. It explores potential security risks associated with these models and offers suggestions for improvement. The study utilizes publicly available datasets to assess Chinese language generation tasks, develops datasets and multidimensional security rating standards for security task evaluations, compares the performance of three models across 5 Chinese tasks and 6 security tasks, and conducts Pearson correlation analysis using GPT-4 and questionnaire surveys. Furthermore, the study implements automatic scoring based on GPT-3.5-Turbe. The experimental findings indicate that the models excel in Chinese language generation tasks. ERNIE Bot outperforms in the evaluation of ideology and ethics, ChatGPT excels in rumor and falsehood and privacy security assessments, and Claude performs well in assessing factual fallacy and social prejudice. The fine-tuned model demonstrates high accuracy in security tasks, yet all models exhibit security vulnerabilities. Integration into the prompt project proves to be effective in mitigating security risks. It is recommended that both domestic and foreign models adhere to the legal frameworks of each country, reduce AI hallucinations, continuously expand corpora, and update iterations accordingly.

引用

页码：151 / 161

页数：11

共 50 条

[21] VeriGen: A Large Language Model for Verilog Code Generation
Thakur, Shailja
Ahmad, Baleegh
Pearce, Hammond
Tan, Benjamin
Dolan-Gavitt, Brendan
Karri, Ramesh
Garg, Siddharth
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (03)
[22] Evaluation and Prediction Method of System Security Situational Awareness Index Based on HMM Model
Qian, Mengjie
SCIENTIFIC PROGRAMMING, 2022, 2022
[23] CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models
Yu, Linhao
Leng, Yongqi
Huang, Yufei
Wu, Shang
Liu, Haixin
Ji, Xinmeng
Zhao, Jiahui
Song, Jinwang
Cui, Tingting
Cheng, Xiaoqing
Liu, Tao
Xiong, Deyi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11817 - 11837
[24] LogExpert: Log-based Recommended Resolutions Generation using Large Language Model
Wang, Jiabo
Chu, Guojun
Wang, Jingyu
Sun, Haifeng
Qi, Qi
Wang, Yuanyi
Qi, Ji
Liao, Jianxin
2024 IEEE/ACM 46TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS, ICSE-NIER 2024, 2024, : 42 - 46
[25] Characterizing the Confidence of Large Language Model-Based Automatic Evaluation Metrics
Stureborg, Rickard
Alikaniotis, Dimitris
Suhara, Yoshi
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 76 - 89
[26] Large Language Models and Security
Bezzi, Michele
IEEE SECURITY & PRIVACY, 2024, 22 (02) : 60 - 68
[27] Application and Evaluation of Large Language Models for the Generation of Survey Questions
Maiorino, Antonio
Padgett, Zoe
Wang, Chun
Yakubovskiy, Misha
Jiang, Peng
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5244 - 5245
[28] An Iterative Polishing Framework Based on Quality Aware Masked Language Model for Chinese Poetry Generation
Deng, Liming
Wang, Jie
Liang, Hangming
Chen, Hui
Xie, Zhiqiang
Zhuang, Bojin
Wang, Shaojun
Xiao, Jing
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7643 - 7650
[29] Large language model-based code generation for the control of construction assembly robots: A hierarchical generation approach
Luo, Hanbin
Wu, Jianxin
Liu, Jiajing
Antwi-Afari, Maxwell Fordjour
DEVELOPMENTS IN THE BUILT ENVIRONMENT, 2024, 19
[30] TCMChat: A generative large language model for traditional Chinese medicine
Dai, Yizheng
Shao, Xin
Zhang, Jinlu
Chen, Yulong
Chen, Qian
Liao, Jie
Chi, Fei
Zhang, Junhua
Fan, Xiaohui
PHARMACOLOGICAL RESEARCH, 2024, 210

← 1 2 3 4 5 →