Paraphrase Acquisition from Image Captions

被引:0
|
作者
Gohsen, Marcel [1 ]
Hagen, Matthias [2 ]
Potthast, Martin [3 ,4 ]
Stein, Benno [1 ]
机构
[1] Bauhaus Univ Weimar, Weimar, Germany
[2] Friedrich Schiller Univ Jena, Jena, Germany
[3] Univ Leipzig, Leipzig, Germany
[4] ScaDS AI, Leipzig, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose to use image captions from the Web as a previously underutilized resource for paraphrases (i.e., texts with the same "message") and to create and analyze a corresponding dataset. When an image is reused on the Web, an original caption is often assigned. We hypothesize that different captions for the same image naturally form a set of mutual paraphrases. To demonstrate the suitability of this idea, we analyze captions in the English Wikipedia, where editors frequently relabel the same image for different articles. The paper introduces the underlying mining technology, the resulting Wikipedia-IPC dataset, and compares known paraphrase corpora with respect to their syntactic and semantic paraphrase similarity to our new resource. In this context, we introduce characteristic maps along the two similarity dimensions to identify the style of paraphrases coming from different sources. An annotation study demonstrates the high reliability of the algorithmically determined characteristic maps.
引用
收藏
页码:3348 / 3358
页数:11
相关论文
共 50 条
  • [1] WRPA: A System for Relational Paraphrase Acquisition from Wikipedia
    Vila, Marta
    Rodriguez, Horacio
    Antonia Marti, M.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 11 - 19
  • [2] Contextual Emotion Estimation from Image Captions
    Yang, Vera
    Srivastava, Archita
    Etesam, Yasaman
    Zhang, Chuxuan
    Lim, Angelica
    [J]. 2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, ACII, 2023,
  • [3] Relational paraphrase acquisition from Wikipedia: The WRPA method and corpus
    Vila, M.
    Rodriguez, H.
    Marti, M. A.
    [J]. NATURAL LANGUAGE ENGINEERING, 2015, 21 (03) : 355 - 389
  • [4] Chinese Whispers: Cooperative Paraphrase Acquisition
    Negri, Matteo
    Mehdad, Yashar
    Marchetti, Alessandro
    Giampiccolo, Danilo
    Bentivogli, Luisa
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2659 - 2665
  • [5] A contrastive review of paraphrase acquisition techniques
    Bouamor, Houda
    Max, Aurelien
    Illouz, Gabriel
    Vilnat, Anne
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2653 - 2658
  • [6] Learning Audio-Video Modalities from Image Captions
    Nagrani, Arsha
    Seo, Paul Hongsuck
    Seybold, Bryan
    Hauth, Anja
    Manen, Santiago
    Sun, Chen
    Schmid, Cordelia
    [J]. COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 407 - 426
  • [7] Comparison of Paraphrase Acquisition Techniques on Sentential Paraphrases
    Bouamor, Houda
    Max, Aurelien
    Vilnat, Anne
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2010, 6233 : 67 - +
  • [8] Paraphrase Acquisition via Crowdsourcing and Machine Learning
    Burrows, Steven
    Potthast, Martin
    Stein, Benno
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2013, 4 (03)
  • [9] Paraphrase acquisition via crowdsourcing and machine learning
    Burrows, Steven
    Potthast, Martin
    Stein, Benno
    [J]. Computer Communication Review, 2013, 43 (03):
  • [10] Interpreting spatial language in image captions
    Mark M. Hall
    Philip D. Smart
    Christopher B. Jones
    [J]. Cognitive Processing, 2011, 12 : 67 - 94