ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos

被引:4
|
作者
Yu, Zhou [1 ]
Zheng, Lixiang [1 ]
Zhao, Zhou [2 ]
Wu, Fei [2 ]
Fan, Jianping [1 ,3 ]
Ren, Kui [4 ]
Yu, Jun [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[3] Lenovo Res, AI Lab, Beijing, Peoples R China
[4] Zhejiang Univ, Sch Cyber Sci & Technol, Hangzhou, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.02221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Building benchmarks to systemically analyze different capabilities of video question answering (VideoQA) models is challenging yet crucial. Existing benchmarks often use non-compositional simple questions and suffer from language biases, making it difficult to diagnose model weaknesses incisively. A recent benchmark AGQA [8] poses a promising paradigm to generate QA pairs automatically from pre-annotated scene graphs, enabling it to measure diverse reasoning abilities with granular control. However, its questions have limitations in reasoning about the fine-grained semantics in videos as such information is absent in its scene graphs. To this end, we present ANetQA, a large-scale benchmark that supports fine-grained compositional reasoning over the challenging untrimmed videos from ActivityNet [4]. Similar to AGQA, the QA pairs in ANetQA are automatically generated from annotated video scene graphs. The fine-grained properties of ANetQA are reflected in the following: (i) untrimmed videos with fine-grained semantics; (ii) spatio-temporal scene graphs with fine-grained taxonomies; and (iii) diverse questions generated from fine-grained templates. ANetQA attains 1.4 billion unbalanced and 13.4 million balanced QA pairs, which is an order of magnitude larger than AGQA with a similar number of videos. Comprehensive experiments are performed for state-of-the-art methods. The best model achieves 44.5% accuracy while human performance tops out at 84.5%, leaving sufficient room for improvement.
引用
收藏
页码:23191 / 23200
页数:10
相关论文
共 50 条
  • [31] Learning fine-grained features via a CNN Tree for Large-scale Classification
    Wang, Zhenhua
    Wang, Xingxing
    Wang, Gang
    NEUROCOMPUTING, 2018, 275 : 1231 - 1240
  • [32] AMP-SPACE: A LARGE-SCALE DATASET FOR FINE-GRAINED TIMBRE TRANSFORMATION
    Naradowsky, Jason
    2021 24TH INTERNATIONAL CONFERENCE ON DIGITAL AUDIO EFFECTS (DAFX), 2021, : 57 - 64
  • [33] Efficient integration of fine-grained access control in large-scale grid services
    Mazzoleni, P
    Crispo, B
    Sivasubramanian, S
    Bertino, E
    2005 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, VOL 1, PROCEEDINGS, 2005, : 77 - 84
  • [34] Fine-grained self-healing hardware for large-scale autonomic systems
    Kumar, VV
    Lach, J
    14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 707 - 712
  • [35] Fine-grained distributed averaging for large-scale radio interferometric measurement sets
    Wei, Shou-Lin
    Luo, Kai-Da
    Wang, Feng
    Deng, Hui
    Mei, Ying
    RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2021, 21 (04)
  • [36] Fine-Grained HTTP Web Traffic Analysis Based on Large-Scale Mobile Datasets
    Fang, Cheng
    Liu, Jun
    Lei, Zhenming
    IEEE ACCESS, 2016, 4 : 4364 - 4373
  • [37] Fine-Grained Histopathological Image Analysis via Robust Segmentation and Large-Scale Retrieval
    Zhang, Xiaofan
    Su, Hai
    Rang, Lin
    Zhang, Shaoting
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 5361 - 5368
  • [38] LFETT2021: A Large-scale Fine-grained Encrypted Tunnel Traffic Dataset
    Gu, Zheyuan
    Gou, Gaopeng
    Hou, Chengshang
    Xiong, Gang
    Li, Zhen
    2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 240 - 249
  • [39] Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models
    Wang, Wei
    Lai, Zhiquan
    Li, Shengwei
    Liu, Weijie
    Ge, Keshi
    Liu, Yujie
    Shen, Ao
    Li, Dongsheng
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 82 - 94
  • [40] Large-Scale Fine-Grained Bird Recognition Based on a Triplet Network and Bilinear Model
    Zhao, Zhicheng
    Luo, Ze
    Li, Jian
    Wang, Kaihua
    Shi, Bingying
    APPLIED SCIENCES-BASEL, 2018, 8 (10):