High Efficiency Image Compression for Large Visual-Language Models

被引：1

作者：

Li, Binzhe ^{[1
]}

Wang, Shurun ^{[2
]}

Wang, Shiqi ^{[1
]}

Ye, Yan ^{[3
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[2] Alibaba Grp, Beijing 311121, Peoples R China

[3] Alibaba Grp US, Sunnyvale, CA 94085 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 03期

关键词：

Image compression for machine; large visual-language model; pre-editing process; VIDEO;

D O I：

10.1109/TCSVT.2024.3488181

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, large visual language models (LVLMs) have shown impressive performance and promising generalization capability in multi-modal tasks, thus replacing humans as receivers of visual information in various application scenarios. In this paper, we pioneer to propose a variable bitrate image compression scheme consisting of a pre-editing module and an end-to-end codec to achieve promising rate-accuracy performance for different LVLMs. In particular, instead of optimizing an adaptive pre-editing network towards a particular task or several representative tasks, we propose a new optimization strategy tailored for LVLMs, which is designed based on the representation and discrimination capability with token-level distortion and rank. The pre-editing module and the variable bitrate end-to-end image codec are jointly trained by the losses based on semantic tokens of the large model, which introduce enhanced generalization capability for various data and tasks. Experimental results demonstrate that the proposed framework could efficiently achieve much better rate-accuracy performance compared to the state-of-the-art coding standard, Versatile Video Coding. Meanwhile, experiments with multi-modal tasks have revealed the robustness and generalization capability of the proposed framework.

引用

页码：2870 / 2880

页数：11

共 50 条

[21] A Survey on Model Compression for Large Language Models
Zhu, Xunyu
Li, Jian
Liu, Yong
Ma, Can
Wang, Weiping
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1556 - 1577
[22] Task-Oriented Grasp Prediction with Visual-Language Inputs
Tang, Chao
Huang, Dehao
Meng, Lingxiao
Liu, Weiyu
Zhang, Hong
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 4881 - 4888
[23] Large Language Models are Visual Reasoning Coordinators
Chen, Liangyu
Li, Bo
Shen, Sheng
Yang, Jingkang
Li, Chunyuan
Keutzer, Kurt
Darrell, Trevor
Liu, Ziwei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[24] Visual cognition in multimodal large language models
Buschoff, Luca M. Schulze
Akata, Elif
Bethge, Matthias
Schulz, Eric
NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106
[25] ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Wang, Xinpeng
Yi, Xiaoyuan
Jiang, Han
Zhou, Shanlin
Wei, Zhihua
Xie, Xing
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3508 - 3533
[26] Most and Least Retrievable Images in Visual-Language Query Systems
Zhu, Liuwan
Ning, Rui
Li, Jiang
Xin, Chunsheng
Wu, Hongyi
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 1 - 18
[27] Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology
Zhou, Xiao
Zhang, Xiaoman
Wu, Chaoyi
Zhang, Ya
Xie, Weidi
Wang, Yanfeng
COMPUTER VISION - ECCV 2024, PT LII, 2025, 15110 : 345 - 362
[28] Exploring image-text combinations in visual humour through large language models (LLMs)
Soriano-Gonzalez, Laura
Belda-Medina, Jose
DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2024,
[29] Category-instance distillation based on visual-language models for rehearsal-free class incremental learning
Jin, Weilong
Wang, Zilei
Zhang, Yixin
IET COMPUTER VISION, 2024,
[30] Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining
Zhou, Benjia
Chen, Zhigang
Clapes, Albert
Wan, Jun
Liang, Yanyan
Escalera, Sergio
Lei, Zhen
Zhang, Du
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20814 - 20824

← 1 2 3 4 5 →