共 50 条
- [41] Evaluating Large Language Models on Controlled Generation Tasks 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168
- [42] Baby steps in evaluating the capacities of large language models Nature Reviews Psychology, 2023, 2 : 451 - 452
- [43] EconNLI: Evaluating Large Language Models on Economics Reasoning FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 982 - 994
- [44] Evaluating Large Language Models for Tax Law Reasoning INTELLIGENT SYSTEMS, BRACIS 2024, PT I, 2025, 15412 : 460 - 474
- [45] A Chinese Dataset for Evaluating the Safeguards in Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3106 - 3119
- [48] DebugBench: Evaluating Debugging Capability of Large Language Models Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, : 4173 - 4198
- [49] Evaluating Nuanced Bias in Large Language Model Free Response Answers NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 378 - 391