A Crowdsourcing Tool for Data Augmentation in Visual Question Answering Tasks

被引:0
|
作者
Silva, Ramon [1 ]
Fonseca, Augusto [1 ]
Goldschmidt, Ronaldo [2 ]
dos Santos, Joel [1 ]
Bezerra, Eduardo [1 ]
机构
[1] CEFET RJ, Rio De Janeiro, Brazil
[2] Inst Mil Engn, Rio De Janeiro, Brazil
关键词
Crowdsourcing; Human Computation; Data Augmentation; Image Annotation;
D O I
10.1145/3243082.3267455
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Visual Question Answering (VQA) is a task that connects the fields of Computer Vision and Natural Language Processing. Taking as input an image I and a natural language question Q about I, a VQA model must be able to produce a coherent answer R (also in natural language) to Q. A particular type of visual question is one in which the question is binary (i.e., a question whose answer belongs to the set {yes, no}). Currently, deep neural networks correspond to the state of the art technique for training of VQA models. Despite its success, the application of neural networks to the VQA task requires a very large amount of data in order to produce models with adequate precision. Datasets currently used for the training of VQA models are the result of laborious manual labeling processes (i.e., made by humans). This context makes relevant the study of approaches to augment these datasets in order to train more accurate prediction models. This paper describes a crowdsourcing tool which can be used in a collaborative manner to augment an existing VQA dataset for binary questions. Our tool actively integrates candidate items from an external data source in order to optimize the selection of queries to be presented to curators.
引用
收藏
页码:137 / 140
页数:4
相关论文
共 50 条
  • [31] sloWCrowd: A crowdsourcing tool for lexicographic tasks
    Fiser, Darja
    Tavcar, Ales
    Erjavec, Tomaz
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3471 - 3475
  • [32] Technical Question Answering across Tasks and Domains
    Yu, Wenhao
    Wu, Lingfei
    Deng, Yu
    Zeng, Qingkai
    Mahindru, Ruchi
    Guven, Sinem
    Jiang, Meng
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 178 - 186
  • [33] Cross-Modal Visual Question Answering for Remote Sensing Data
    Felix, Rafael
    Repasky, Boris
    Hodge, Samuel
    Zolfaghari, Reza
    Abbasnejad, Ehsan
    Sherrah, Jamie
    [J]. 2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 57 - 65
  • [34] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
  • [35] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [36] Visual Question Answering for Cultural Heritage
    Bongini, Pietro
    Becattini, Federico
    Bagdanov, Andrew D.
    Del Bimbo, Alberto
    [J]. INTERNATIONAL CONFERENCE FLORENCE HERI-TECH: THE FUTURE OF HERITAGE SCIENCE AND TECHNOLOGIES, 2020, 949
  • [37] Question action relevance and editing for visual question answering
    Andeep S. Toor
    Harry Wechsler
    Michele Nappi
    [J]. Multimedia Tools and Applications, 2019, 78 : 2921 - 2935
  • [38] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    [J]. NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [39] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
  • [40] Question action relevance and editing for visual question answering
    Toor, Andeep S.
    Wechsler, Harry
    Nappi, Michele
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 2921 - 2935