ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos

被引:4
|
作者
Yu, Zhou [1 ]
Zheng, Lixiang [1 ]
Zhao, Zhou [2 ]
Wu, Fei [2 ]
Fan, Jianping [1 ,3 ]
Ren, Kui [4 ]
Yu, Jun [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[3] Lenovo Res, AI Lab, Beijing, Peoples R China
[4] Zhejiang Univ, Sch Cyber Sci & Technol, Hangzhou, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.02221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Building benchmarks to systemically analyze different capabilities of video question answering (VideoQA) models is challenging yet crucial. Existing benchmarks often use non-compositional simple questions and suffer from language biases, making it difficult to diagnose model weaknesses incisively. A recent benchmark AGQA [8] poses a promising paradigm to generate QA pairs automatically from pre-annotated scene graphs, enabling it to measure diverse reasoning abilities with granular control. However, its questions have limitations in reasoning about the fine-grained semantics in videos as such information is absent in its scene graphs. To this end, we present ANetQA, a large-scale benchmark that supports fine-grained compositional reasoning over the challenging untrimmed videos from ActivityNet [4]. Similar to AGQA, the QA pairs in ANetQA are automatically generated from annotated video scene graphs. The fine-grained properties of ANetQA are reflected in the following: (i) untrimmed videos with fine-grained semantics; (ii) spatio-temporal scene graphs with fine-grained taxonomies; and (iii) diverse questions generated from fine-grained templates. ANetQA attains 1.4 billion unbalanced and 13.4 million balanced QA pairs, which is an order of magnitude larger than AGQA with a similar number of videos. Comprehensive experiments are performed for state-of-the-art methods. The best model achieves 44.5% accuracy while human performance tops out at 84.5%, leaving sufficient room for improvement.
引用
收藏
页码:23191 / 23200
页数:10
相关论文
共 50 条
  • [21] RPC: a large-scale and fine-grained retail product checkout dataset
    Xiu-Shen WEI
    Quan CUI
    Lei YANG
    Peng WANG
    Lingqiao LIU
    Jian YANG
    ScienceChina(InformationSciences), 2022, 65 (09) : 289 - 290
  • [22] A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA
    Zhou, Jie
    Dou, Yong
    Zhao, Jianxun
    Xia, Fei
    Lei, Yuanwu
    Tang, Yuxing
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2009, 5737 : 110 - +
  • [23] RPC: a large-scale and fine-grained retail product checkout dataset
    Xiu-Shen Wei
    Quan Cui
    Lei Yang
    Peng Wang
    Lingqiao Liu
    Jian Yang
    Science China Information Sciences, 2022, 65
  • [24] Fine-Grained Spoiler Detection from Large-Scale Review Corpora
    Wan, Mengting
    Misra, Rishabh
    Nakashole, Ndapa
    McAuley, Julian
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2605 - 2610
  • [25] ParaFlow: Fine-grained parallel SDN controller for large-scale networks
    Song, Ping
    Liu, Yi
    Liu, Chi
    Qian, Depei
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2017, 87 : 46 - 59
  • [26] TaDA Live: Compositional Reasoning for Termination of Fine-grained Concurrent Programs
    D'Osualdo, Emanuele
    Sutherland, Julian
    Farzan, Azadeh
    Gardner, Philippa
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2021, 43 (04):
  • [27] PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding
    Mo, Kaichun
    Zhu, Shilin
    Chang, Angel X.
    Yi, Li
    Tripathi, Subarna
    Guibas, Leonidas J.
    Su, Hao
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 909 - 918
  • [28] MCSHIPS: A LARGE-SCALE SHIP DATASET FOR DETECTION AND FINE-GRAINED CATEGORIZATION IN THE WILD
    Zheng, Yitong
    Zhang, Shun
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [29] A Large-Scale Frontal Vehicle Image Dataset for Fine-Grained Vehicle Categorization
    Lu, Lei
    Wang, Ping
    Huang, Hua
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (03) : 1818 - 1828
  • [30] Fine-grained distributed averaging for large-scale radio interferometric measurement sets
    Shou-Lin Wei
    Kai-Da Luo
    Feng Wang
    Hui Deng
    Ying Mei
    Research in Astronomy and Astrophysics, 2021, 21 (04) : 17 - 24