Chinese Generation and Security Index Evaluation Based on Large Language Model

被引：0

作者：

Zhang, Yu ^{[1
]}

Gao, Yongbing ^{[1
]}

Li, Weihao ^{[1
]}

Su, Zirong ^{[1
]}

Yang, Lidong ^{[1
]}

机构：

[1] Inner Mongolia Univ Sci & Technol, Sch Numer Ind, Baotou, Inner Mongolia, Peoples R China

来源：

2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Safety Assessment; Chinese Generation; AI Hallucination; Automatic Scoring; Large Language Model;

D O I：

10.1109/IALP63756.2024.10661189

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study investigates the performance and security indicators of mainstream large language models in Chinese generation tasks. It explores potential security risks associated with these models and offers suggestions for improvement. The study utilizes publicly available datasets to assess Chinese language generation tasks, develops datasets and multidimensional security rating standards for security task evaluations, compares the performance of three models across 5 Chinese tasks and 6 security tasks, and conducts Pearson correlation analysis using GPT-4 and questionnaire surveys. Furthermore, the study implements automatic scoring based on GPT-3.5-Turbe. The experimental findings indicate that the models excel in Chinese language generation tasks. ERNIE Bot outperforms in the evaluation of ideology and ethics, ChatGPT excels in rumor and falsehood and privacy security assessments, and Claude performs well in assessing factual fallacy and social prejudice. The fine-tuned model demonstrates high accuracy in security tasks, yet all models exhibit security vulnerabilities. Integration into the prompt project proves to be effective in mitigating security risks. It is recommended that both domestic and foreign models adhere to the legal frameworks of each country, reduce AI hallucinations, continuously expand corpora, and update iterations accordingly.

引用

页码：151 / 161

页数：11

共 50 条

[1] CPSDbench: a large language model evaluation benchmark and baseline for Chinese public security domain
Tong, Xin
Jin, Bo
Lin, Zhi
Wang, Binjun
Cheng, Qiang
Yu, Ting
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
[2] Development and evaluation of a large language model of ophthalmology in Chinese
Zheng, Ce
Ye, Hongfei
Guo, Jinming
Yang, Junrui
Fei, Ping
Yuan, Yuanzhi
Huang, Danqing
Huang, Yuqiang
Peng, Jie
Xie, Xiaoling
Xie, Meng
Zhao, Peiquan
Chen, Li
Zhang, Mingzhi
BRITISH JOURNAL OF OPHTHALMOLOGY, 2024,
[3] A novel water poverty index model for evaluation of Chinese regional water security
Gong, L.
Jin, C. L.
Li, Y. X.
Zhou, Z. L.
3RD INTERNATIONAL CONFERENCE ON WATER RESOURCE AND ENVIRONMENT (WRE 2017), 2017, 82
[4] LLMGA: Multimodal Large Language Model Based Generation Assistant
Xia, Bin
Wang, Shiyin
Tao, Yingfan
Wang, Yitong
Jia, Jiaya
COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 389 - 406
[5] Intelligent Security Q&A System Based on Large Language Model
Zhou, Youtao
Lu, Qiuhong
Fan, Haoyu
Xiao, Yuntao
Hu, Jinwen
Zhang, Shimian
2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 271 - 275
[6] Chinese Text Open Domain Tag Generation Method via Large Language Model
He, Chunhui
Ge, Bin
Zhang, Chong
2024 10TH INTERNATIONAL CONFERENCE ON BIG DATA AND INFORMATION ANALYTICS, BIGDIA 2024, 2024, : 183 - 188
[7] A Security Evaluation Model for Edge Information Systems Based on Index Screening
Guo, Ziyu
Qi, Jiahao
Zuo, Jinxin
Xie, Weixuan
Lu, Yueming
Tian, Huiping
Cao, Ruohan
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (12): : 21585 - 21603
[8] Construction of Evaluation Index for Chinese Engineering Undergraduates Based on CIPP Model
Duan, Peitong
Xiang, Jiawen
Niu, Huijun
Han, Caiqin
SAGE OPEN, 2023, 13 (01):
[9] Large language model for patent concept generation
Ren, Runtao
Ma, Jian
Luo, Jianxi
ADVANCED ENGINEERING INFORMATICS, 2025, 65
[10] The Security Evaluation Index Architecture and Evaluation Model with RFID System
Luo, Hengfeng
Liu, Ruiqi
Wang, Yingkai
PROGRESS IN MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2014, 462-463 : 399 - 404

← 1 2 3 4 5 →