共 50 条
- [32] GPTQT: Quantize Large Language Models Twice to Push the Efficiency 2024 IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, CIS AND IEEE INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND MECHATRONICS, RAM, CIS-RAM 2024, 2024, : 368 - 373
- [33] Layer-Condensed KV Cache for Efficient Inference of Large Language Models PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11175 - 11188
- [34] EchoSwift An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs) COMPANION OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE COMPANION 2024, 2024, : 158 - 162
- [35] Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach 20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 244 - 249
- [36] Tabi: An Efficient Multi-Level Inference System for Large Language Models PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 233 - 248
- [38] An efficient quantized GEMV implementation for large language models inference with matrix core JOURNAL OF SUPERCOMPUTING, 2025, 81 (03):
- [39] Distributed Inference and Fine-tuning of Large Language Models Over The Internet ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
- [40] Assessing Large Language Models for Oncology Data Inference From Radiology Reports JCO CLINICAL CANCER INFORMATICS, 2024, 8