Multi-modality Paraphrase Generation Model Integrating Image Information

被引:0
|
作者
Ma C. [1 ]
Wan Z. [1 ]
Zhang Y. [1 ]
Xu J. [1 ]
Chen Y. [1 ]
机构
[1] School of Computer and Information Technology, Beijing Jiaotong University, Beijing
关键词
Abstract scene graph; Attention mechansim; Multi-modality; Paraphrase generation;
D O I
10.13209/j.0479-8023.2021.110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-modality scenarios such as commodity descriptions and news comments, existing paraphrase generation models can not utilize information from image and therefore result in the loss of semantics in the generated paraphrases. In order to solve this problem, this paper first propose the Multi-modality Paraphrase Generation (MPG) model to integrate image information for paraphrase generation. In MPG, in order to integrate the image information corresponding to the original sentence, the authors first construct an abstract scene graph and transform the image features into node features of the scene graph. Furthermore, the constructed scene graph was utilized to generate paraphrase, by using the relational graph convolutional neural network for encoder and graph-based attention mechanism for decoder. In the evaluation stage, a sentence pair similarity calculation method was proposed to select sentence pairs describing same objects from the MSCOCO data set, and then evaluation experiments were conducted. Experimental results show that the proposed MPG model achieve better semantic fidelity, which indicates that the integration of image information is effective in improving the quality of the paraphrase generation in multi-modality scenarios. © 2022 Peking University.
引用
收藏
页码:45 / 53
页数:8
相关论文
共 25 条
  • [1] Dong L, Mallinson J, Reddy S, Et al., Learning to paraphrase for question answering, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 875-886, (2017)
  • [2] Zhou Z, Sperber M, Waibel A., Paraphrases as Foreign Languages in Multilingual Neural Machine Transla-tion, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 113-122, (2019)
  • [3] Zhao S, Rui M, He D, Et al., Integrating transformer and paraphrase rules for sentence simplification, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3164-3173, (2018)
  • [4] Lin Z, Li Z, Ding N, Et al., Integrating linguistic knowledge to sentence paraphrase generation, Procee-dings of the AAAI Conference on Artificial Intelli-gence, 34, 5, pp. 8368-8375, (2020)
  • [5] Lin Z, Wan X., Pushing paraphrase away from original sentence: a multi-round paraphrase generation approa-ch [C/OL], Findings of the Association for Compu-tational Linguistics: ACL/IJCNLP, (2021)
  • [6] Chen S, Jin Q, Wang P, Et al., Say as you wish: fine-grained control of image caption generation with abstract scene graphs, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9959-9968, (2020)
  • [7] Chu C, Otani M, Nakashima Y., iParaphrasing: extrac-ting visually grounded paraphrases via an image, Proceedings of the 27th International Conference on Computational Linguistics (COLING), pp. 3479-3492, (2018)
  • [8] Liu L, Tang J, Wan X, Et al., Generating diverse and descriptive image captions using visual paraphrases, Proceedings of the IEEE/CVF International Conferen-ce on Computer Vision (ICCV), pp. 4240-4249, (2019)
  • [9] Yin Y, Meng F, Su J, Et al., A novel graph-based multi-modal fusion encoder for neural machine translation [C/OL], Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), (2020)
  • [10] Hirasawa T, Yamagishi H, Matsumura Y, Et al., Mul-timodal machine translation with embedding pre-diction, Proceedings of the 2019 Conference of the North American Chapter of the Association for Com-putational Linguistics - Human Language Technolo-gies: Student Research Workshop, pp. 86-91, (2019)