Fine-grained Multimodal Entity Linking for Videos

被引:0
|
作者
Zhao H.-Q. [1 ,2 ]
Wang X.-W. [1 ,2 ]
Li J.-L. [3 ]
Li Z.-X. [1 ,2 ]
Xiao Y.-H. [1 ,2 ]
机构
[1] School of Computer Science, Fudan University, Shanghai
[2] Shanghai Key Laboratory of Data Science, Fudan University, Shanghai
[3] School of Computer Science and Technology, Soochow University, Suzhou
来源
Ruan Jian Xue Bao/Journal of Software | 2024年 / 35卷 / 03期
关键词
contrastive learning; dataset; fine-grained; large language model; video entity linking;
D O I
10.13328/j.cnki.jos.007078
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet and big data, the scale and variety of data are increasing. Video, as an important form of information, is becoming increasingly prevalent, particularly with the recent growth of short videos. Understanding and analyzing large-scale videos has become a hot topic of research. Entity linking, as a way of enriching background knowledge, can provide a wealth of external information. Entity linking in videos can effectively assist in understanding the content of video, enabling classification, retrieval, and recommendation of video content. However, the granularity of existing video linking datasets and methods is too coarse. Therefore, this study proposes a video-based fine-grained entity linking approach, focusing on live streaming scenarios, and constructs a fine-grained video entity linking dataset. Additionally, based on the challenges of fine-grained video linking tasks, this study proposes the use of large models to extract entities and their attributes from videos, as well as utilizing contrastive learning to obtain better representations of videos and their corresponding entities. The results demonstrate that the proposed method can effectively handle fine-grained entity linking tasks in videos. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1140 / 1153
页数:13
相关论文
共 43 条
  • [1] Shen W, Wang J, Han J., Entity linking with a knowledge base: Issues, techniques, and solutions, IEEE Trans. on Knowledge and Data Engineering, 27, 2, pp. 443-460, (2014)
  • [2] Li Y, Yang X, Luo J., Semantic video entity linking based on visual content and metadata, Proc. of the IEEE Int’l Conf. on Computer Vision, pp. 4615-4623, (2015)
  • [3] Venkitasubramanian AN, Tuytelaars T, Moens MF., Entity linking across vision and language, Multimedia Tools and Applications, 76, pp. 22599-22622, (2017)
  • [4] Grams T, Li H, Tong B, Et al., Semantic video entity linking, Proc. of the European Semantic Web Conf, pp. 129-132, (2022)
  • [5] Adjali O, Besancon R, Ferret O, Et al., Building a multimodal entity linking dataset from Tweets, Proc. of the 12th Language Resources and Evaluation Conf, pp. 4285-4292, (2020)
  • [6] Adjali O, Besancon R, Ferret O, Et al., Multimodal entity linking for Tweets, Proc. of the European Conf. on Information Retrieval, pp. 463-478, (2020)
  • [7] Dost S, Serafini L, Rospocher M, Et al., VTKEL: A resource for visual-textual-knowledge entity linking, Proc. of the 35th Annual ACM Symp. on Applied Computing, pp. 2021-2028, (2020)
  • [8] Zhang L, Li ZX, Yang Q., Attention-based multimodal entity linking with high-quality images, Proc. of the 26th Int’l Conf. on Database Systems for Advanced Applications (DASFAA 2021), pp. 533-548, (2021)
  • [9] Zhou X, Wang P, Li G, Et al., Weibo-MEL, Wikidata-MEL and Richpedia-MEL: Multimodal entity linking benchmark datasets, Proc. of the 6th China Conf. on Knowledge Graph and Semantic Computing: Knowledge Graph Empowers New Infrastructure Construction (CCKS 2021), 6, pp. 315-320, (2021)
  • [10] Zheng Q, Wen H, Wang M, Et al., Faster zero-shot multi-modal entity linking via visual-linguistic representation, Data Intelligence, 4, 3, pp. 493-508, (2022)