Automatic Concept Discovery from Parallel Text and Visual Corpora

被引:73
|
作者
Sun, Chen [1 ]
Gan, Chuang [2 ]
Nevatia, Ram [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90089 USA
[2] Tsinghua Univ, Beijing, Peoples R China
关键词
D O I
10.1109/ICCV.2015.298
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans connect language and vision to perceive the world. How to build a similar connection for computers? One possible way is via visual concepts, which are text terms that relate to visually discriminative entities. We propose an automatic visual concept discovery algorithm using parallel text and visual corpora; it filters text terms based on the visual discriminative power of the associated images, and groups them into concepts using visual and semantic similarities. We illustrate the applications of the discovered concepts using bidirectional image and sentence retrieval task and image tagging task, and show that the discovered concepts not only outperform several large sets of manually selected concepts significantly, but also achieves the state-of-the-art performance in the retrieval task.
引用
收藏
页码:2596 / 2604
页数:9
相关论文
共 50 条
  • [31] Parallel text processing: Alignment and use of translation corpora
    Resnik, P
    COMPUTATIONAL LINGUISTICS, 2001, 27 (04) : 592 - 595
  • [32] A Hybrid Approach for Automatic Extraction of Bilingual Multiword Expressions from Parallel Corpora
    Semmar, Nasredine
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 311 - 318
  • [33] Building parallel corpora by automatic title alignment using length-based and text-based approaches
    Yang, CC
    Li, KW
    INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (06) : 939 - 955
  • [34] Automatic Extraction of Property Norm-Like Data From Large Text Corpora
    Kelly, Colin
    Devereux, Barry
    Korhonen, Anna
    COGNITIVE SCIENCE, 2014, 38 (04) : 638 - 682
  • [35] Automatic Mapping of Social Networks of Actors From Text Corpora: Time Series Analysis
    Danowski, James A.
    Cepela, Noah
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, 2009, : 137 - 142
  • [36] Automatic utterance boundaries recognition in large Polish text corpora
    Rudolf, M
    Swidzinski, M
    INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2004, : 247 - 256
  • [37] Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs
    Pablo Consuegra-Ayala, Juan
    Gutierrez, Yoan
    Piad-Morffis, Alejandro
    Almeida-Cruz, Yudivian
    Palomar, Manuel
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 116
  • [38] Text mining of bilingual parallel corpora with a measure of semantic similarity
    Lee, CH
    Yang, HC
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 470 - 475
  • [39] New Kazakh Parallel Text Corpora with On-line Access
    Zhumanov, Zhandos
    Madiyeva, Aigerim
    Rakhimova, Diana
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2017, PT II, 2017, 10449 : 501 - 508
  • [40] Extraction of recurrent text patterns from text corpora
    Lemnitzer, L
    LEXICON AND TEST: REUSABLE METHODS AND RESOURCES FOR THE LINGUISTIC DEVELOPMENT OF GERMAN, 1996, 73 : 23 - 35