Chinese Generation and Security Index Evaluation Based on Large Language Model

被引:0
|
作者
Zhang, Yu [1 ]
Gao, Yongbing [1 ]
Li, Weihao [1 ]
Su, Zirong [1 ]
Yang, Lidong [1 ]
机构
[1] Inner Mongolia Univ Sci & Technol, Sch Numer Ind, Baotou, Inner Mongolia, Peoples R China
基金
中国国家自然科学基金;
关键词
Safety Assessment; Chinese Generation; AI Hallucination; Automatic Scoring; Large Language Model;
D O I
10.1109/IALP63756.2024.10661189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study investigates the performance and security indicators of mainstream large language models in Chinese generation tasks. It explores potential security risks associated with these models and offers suggestions for improvement. The study utilizes publicly available datasets to assess Chinese language generation tasks, develops datasets and multidimensional security rating standards for security task evaluations, compares the performance of three models across 5 Chinese tasks and 6 security tasks, and conducts Pearson correlation analysis using GPT-4 and questionnaire surveys. Furthermore, the study implements automatic scoring based on GPT-3.5-Turbe. The experimental findings indicate that the models excel in Chinese language generation tasks. ERNIE Bot outperforms in the evaluation of ideology and ethics, ChatGPT excels in rumor and falsehood and privacy security assessments, and Claude performs well in assessing factual fallacy and social prejudice. The fine-tuned model demonstrates high accuracy in security tasks, yet all models exhibit security vulnerabilities. Integration into the prompt project proves to be effective in mitigating security risks. It is recommended that both domestic and foreign models adhere to the legal frameworks of each country, reduce AI hallucinations, continuously expand corpora, and update iterations accordingly.
引用
收藏
页码:151 / 161
页数:11
相关论文
共 50 条
  • [1] CPSDbench: a large language model evaluation benchmark and baseline for Chinese public security domain
    Tong, Xin
    Jin, Bo
    Lin, Zhi
    Wang, Binjun
    Cheng, Qiang
    Yu, Ting
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [2] Development and evaluation of a large language model of ophthalmology in Chinese
    Zheng, Ce
    Ye, Hongfei
    Guo, Jinming
    Yang, Junrui
    Fei, Ping
    Yuan, Yuanzhi
    Huang, Danqing
    Huang, Yuqiang
    Peng, Jie
    Xie, Xiaoling
    Xie, Meng
    Zhao, Peiquan
    Chen, Li
    Zhang, Mingzhi
    BRITISH JOURNAL OF OPHTHALMOLOGY, 2024,
  • [3] A novel water poverty index model for evaluation of Chinese regional water security
    Gong, L.
    Jin, C. L.
    Li, Y. X.
    Zhou, Z. L.
    3RD INTERNATIONAL CONFERENCE ON WATER RESOURCE AND ENVIRONMENT (WRE 2017), 2017, 82
  • [4] LLMGA: Multimodal Large Language Model Based Generation Assistant
    Xia, Bin
    Wang, Shiyin
    Tao, Yingfan
    Wang, Yitong
    Jia, Jiaya
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 389 - 406
  • [5] Intelligent Security Q&A System Based on Large Language Model
    Zhou, Youtao
    Lu, Qiuhong
    Fan, Haoyu
    Xiao, Yuntao
    Hu, Jinwen
    Zhang, Shimian
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 271 - 275
  • [6] Chinese Text Open Domain Tag Generation Method via Large Language Model
    He, Chunhui
    Ge, Bin
    Zhang, Chong
    2024 10TH INTERNATIONAL CONFERENCE ON BIG DATA AND INFORMATION ANALYTICS, BIGDIA 2024, 2024, : 183 - 188
  • [7] A Security Evaluation Model for Edge Information Systems Based on Index Screening
    Guo, Ziyu
    Qi, Jiahao
    Zuo, Jinxin
    Xie, Weixuan
    Lu, Yueming
    Tian, Huiping
    Cao, Ruohan
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (12): : 21585 - 21603
  • [8] Construction of Evaluation Index for Chinese Engineering Undergraduates Based on CIPP Model
    Duan, Peitong
    Xiang, Jiawen
    Niu, Huijun
    Han, Caiqin
    SAGE OPEN, 2023, 13 (01):
  • [9] Large language model for patent concept generation
    Ren, Runtao
    Ma, Jian
    Luo, Jianxi
    ADVANCED ENGINEERING INFORMATICS, 2025, 65
  • [10] The Security Evaluation Index Architecture and Evaluation Model with RFID System
    Luo, Hengfeng
    Liu, Ruiqi
    Wang, Yingkai
    PROGRESS IN MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2014, 462-463 : 399 - 404