共 50 条
- [32] StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11143 - 11156
- [33] Benchmarking Large Language Models on Communicative Medical Coaching: A Dataset and a Novel System FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1624 - 1637
- [34] EchoSwift An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs) COMPANION OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE COMPANION 2024, 2024, : 158 - 162
- [35] Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study IEEE ACCESS, 2025, 13 : 29698 - 29717
- [36] Large language models and rheumatology: a comparative evaluation LANCET RHEUMATOLOGY, 2023, 5 (10): : E574 - E578
- [37] Automatic Evaluation of Attribution by Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4615 - 4635
- [38] Factuality Enhanced Language Models for Open-Ended Text Generation ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [39] (sic) UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5266 - 5293