Chinese Generation and Security Index Evaluation Based on Large Language Model

被引:0
|
作者
Zhang, Yu [1 ]
Gao, Yongbing [1 ]
Li, Weihao [1 ]
Su, Zirong [1 ]
Yang, Lidong [1 ]
机构
[1] Inner Mongolia Univ Sci & Technol, Sch Numer Ind, Baotou, Inner Mongolia, Peoples R China
基金
中国国家自然科学基金;
关键词
Safety Assessment; Chinese Generation; AI Hallucination; Automatic Scoring; Large Language Model;
D O I
10.1109/IALP63756.2024.10661189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study investigates the performance and security indicators of mainstream large language models in Chinese generation tasks. It explores potential security risks associated with these models and offers suggestions for improvement. The study utilizes publicly available datasets to assess Chinese language generation tasks, develops datasets and multidimensional security rating standards for security task evaluations, compares the performance of three models across 5 Chinese tasks and 6 security tasks, and conducts Pearson correlation analysis using GPT-4 and questionnaire surveys. Furthermore, the study implements automatic scoring based on GPT-3.5-Turbe. The experimental findings indicate that the models excel in Chinese language generation tasks. ERNIE Bot outperforms in the evaluation of ideology and ethics, ChatGPT excels in rumor and falsehood and privacy security assessments, and Claude performs well in assessing factual fallacy and social prejudice. The fine-tuned model demonstrates high accuracy in security tasks, yet all models exhibit security vulnerabilities. Integration into the prompt project proves to be effective in mitigating security risks. It is recommended that both domestic and foreign models adhere to the legal frameworks of each country, reduce AI hallucinations, continuously expand corpora, and update iterations accordingly.
引用
收藏
页码:151 / 161
页数:11
相关论文
共 50 条
  • [21] VeriGen: A Large Language Model for Verilog Code Generation
    Thakur, Shailja
    Ahmad, Baleegh
    Pearce, Hammond
    Tan, Benjamin
    Dolan-Gavitt, Brendan
    Karri, Ramesh
    Garg, Siddharth
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (03)
  • [22] Evaluation and Prediction Method of System Security Situational Awareness Index Based on HMM Model
    Qian, Mengjie
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [23] CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models
    Yu, Linhao
    Leng, Yongqi
    Huang, Yufei
    Wu, Shang
    Liu, Haixin
    Ji, Xinmeng
    Zhao, Jiahui
    Song, Jinwang
    Cui, Tingting
    Cheng, Xiaoqing
    Liu, Tao
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11817 - 11837
  • [24] LogExpert: Log-based Recommended Resolutions Generation using Large Language Model
    Wang, Jiabo
    Chu, Guojun
    Wang, Jingyu
    Sun, Haifeng
    Qi, Qi
    Wang, Yuanyi
    Qi, Ji
    Liao, Jianxin
    2024 IEEE/ACM 46TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS, ICSE-NIER 2024, 2024, : 42 - 46
  • [25] Characterizing the Confidence of Large Language Model-Based Automatic Evaluation Metrics
    Stureborg, Rickard
    Alikaniotis, Dimitris
    Suhara, Yoshi
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 76 - 89
  • [26] Large Language Models and Security
    Bezzi, Michele
    IEEE SECURITY & PRIVACY, 2024, 22 (02) : 60 - 68
  • [27] Application and Evaluation of Large Language Models for the Generation of Survey Questions
    Maiorino, Antonio
    Padgett, Zoe
    Wang, Chun
    Yakubovskiy, Misha
    Jiang, Peng
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5244 - 5245
  • [28] An Iterative Polishing Framework Based on Quality Aware Masked Language Model for Chinese Poetry Generation
    Deng, Liming
    Wang, Jie
    Liang, Hangming
    Chen, Hui
    Xie, Zhiqiang
    Zhuang, Bojin
    Wang, Shaojun
    Xiao, Jing
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7643 - 7650
  • [29] Large language model-based code generation for the control of construction assembly robots: A hierarchical generation approach
    Luo, Hanbin
    Wu, Jianxin
    Liu, Jiajing
    Antwi-Afari, Maxwell Fordjour
    DEVELOPMENTS IN THE BUILT ENVIRONMENT, 2024, 19
  • [30] TCMChat: A generative large language model for traditional Chinese medicine
    Dai, Yizheng
    Shao, Xin
    Zhang, Jinlu
    Chen, Yulong
    Chen, Qian
    Liao, Jie
    Chi, Fei
    Zhang, Junhua
    Fan, Xiaohui
    PHARMACOLOGICAL RESEARCH, 2024, 210