Multi-Modal Convolutional Dictionary Learning

被引：30

作者：

Gao, Fangyuan ^{[1
]}

Deng, Xin ^{[1
]}

Xu, Mai ^{[2
]}

Xu, Jingyi ^{[2
]}

Dragotti, Pier Luigi ^{[3
]}

机构：

[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China

[2] Beihang Univ, Dept Elect Informat Engn, Beijing 100191, Peoples R China

[3] Imperial Coll London, Dept Elect & Elect Engn, London SW7 2AZ, England

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2022年 / 31卷

基金：

北京市自然科学基金;

关键词：

Dictionaries; Training; Memory management; Noise level; Toy manufacturing industry; Standards; Paints; Multi-modal dictionary learning; convolutional sparse coding; image denoising; IMAGE SUPERRESOLUTION; LOW-RANK; SPARSE; TRANSFORM;

D O I：

10.1109/TIP.2022.3141251

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional dictionary learning has become increasingly popular in signal and image processing for its ability to overcome the limitations of traditional patch-based dictionary learning. Although most studies on convolutional dictionary learning mainly focus on the unimodal case, real-world image processing tasks usually involve images from multiple modalities, e.g., visible and near-infrared (NIR) images. Thus, it is necessary to explore convolutional dictionary learning across different modalities. In this paper, we propose a novel multi-modal convolutional dictionary learning algorithm, which efficiently correlates different image modalities and fully considers neighborhood information at the image level. In this model, each modality is represented by two convolutional dictionaries, in which one dictionary is for common feature representation and the other is for unique feature representation. The model is constrained by the requirement that the convolutional sparse representations (CSRs) for the common features should be the same across different modalities, considering that these images are captured from the same scene. We propose a new training method based on the alternating direction method of multipliers (ADMM) to alternatively learn the common and unique dictionaries in the discrete Fourier transform (DFT) domain. We show that our model converges in less than 20 iterations between the convolutional dictionary updating and the CSRs calculation. The effectiveness of the proposed dictionary learning algorithm is demonstrated on various multimodal image processing tasks, achieves better performance than both dictionary learning methods and deep learning based methods with limited training data.

引用

页码：1325 / 1339

页数：15

共 50 条

[31] Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition
Hampiholi, Basavaraj
Jarvers, Christian
Mader, Wolfgang
Neumann, Heiko
IEEE ACCESS, 2023, 11 : 34094 - 34103
[32] Multi-modal Information Extraction and Fusion with Convolutional Neural Networks
Kumar, Dinesh
Sharma, Dharmendra
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[33] Multi-modal page stream segmentation with convolutional neural networks
Wiedemann, Gregor
Heyer, Gerhard
LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (01) : 127 - 150
[34] Multi-Modal Reflection Removal Using Convolutional Neural Networks
Sun, Jun
Chang, Yakun
Jung, Cheolkon
Feng, Jiawei
IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (07) : 1011 - 1015
[35] Multi-modal page stream segmentation with convolutional neural networks
Gregor Wiedemann
Gerhard Heyer
Language Resources and Evaluation, 2021, 55 : 127 - 150
[36] Graph Convolutional Multi-modal Hashing for Flexible Multimedia Retrieval
Lu, Xu
Zhu, Lei
Liu, Li
Nie, Liqiang
Zhang, Huaxiang
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1414 - 1422
[37] Multi-Modal Depth Estimation Using Convolutional Neural Networks
Siddiqui, Sadique Adnan
Vierling, Axel
Berns, Karsten
2020 IEEE INTERNATIONAL SYMPOSIUM ON SAFETY, SECURITY, AND RESCUE ROBOTICS (SSRR 2020), 2020, : 354 - 359
[38] On Multi-modal Fusion Learning in constraint propagation
Li, Yaoyi
Lu, Hongtao
INFORMATION SCIENCES, 2018, 462 : 204 - 217
[39] On Multi-Modal Learning of Editing Source Code
Chakraborty, Saikat
Ray, Baishakhi
2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 443 - 455
[40] Mineral: Multi-modal Network Representation Learning
Kefato, Zekarias T.
Sheikh, Nasrullah
Montresor, Alberto
MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 286 - 298

← 1 2 3 4 5 →