共 50 条
- [21] EVALUATING THE PERFORMANCE OF DIFFERENT LARGE LANGUAGE MODELS ON HEALTH CONSULTATION AND PATIENT EDUCATION IN UROLITHIASIS JOURNAL OF UROLOGY, 2024, 211 (05): : E391 - E392
- [23] Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis Journal of Medical Systems, 47
- [24] HI- TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10691 - 10706
- [25] CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11817 - 11837
- [26] ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [27] Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models' Understanding of Discourse Relations PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 6277 - 6295
- [29] SafeLLMs: A Benchmark for Secure Bilingual Evaluation of Large Language Models NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 437 - 448