共 50 条
- [41] Evaluating Large Language Models on Controlled Generation Tasks 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168
- [42] Baby steps in evaluating the capacities of large language models Nature Reviews Psychology, 2023, 2 : 451 - 452
- [43] EconNLI: Evaluating Large Language Models on Economics Reasoning FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 982 - 994
- [44] Evaluating Large Language Models for Tax Law Reasoning INTELLIGENT SYSTEMS, BRACIS 2024, PT I, 2025, 15412 : 460 - 474
- [45] A Chinese Dataset for Evaluating the Safeguards in Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3106 - 3119
- [48] DebugBench: Evaluating Debugging Capability of Large Language Models Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, : 4173 - 4198
- [49] Prompting a Large Language Model to Generate Diverse Motivational Messages A Comparison with Human-Written Messages PROCEEDINGS OF THE 11TH CONFERENCE ON HUMAN-AGENT INTERACTION, HAI 2023, 2023, : 378 - 380
- [50] Exploring Reversal Mathematical Reasoning Ability for Large Language Models FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 13671 - 13685