共 50 条
- [1] Benchmarking medical large language models NATURE REVIEWS BIOENGINEERING, 2023, 1 (08): : 543 - 543
- [2] Benchmarking AutoGen with different large language models 2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 263 - 264
- [4] Benchmarking Large Language Models: Opportunities and Challenges PERFORMANCE EVALUATION AND BENCHMARKING, TPCTC 2023, 2024, 14247 : 77 - 89
- [5] FELM: Benchmarking Factuality Evaluation of Large Language Models ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
- [6] Benchmarking Biomedical Relation Knowledge in Large Language Models BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024, 2024, 14955 : 482 - 495
- [7] Benchmarking Cognitive Biases in Large Language Models as Evaluators FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 517 - 545
- [8] TOMBENCH: Benchmarking Theory of Mind in Large Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15959 - 15983