An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video

被引:1
|
作者
Khwileh, Ahmad [1 ]
Ganguly, Debasis [1 ]
Jones, Gareth J. F. [1 ]
机构
[1] Dublin City Univ, Sch Comp, ADAPT Ctr, Dublin 9, Ireland
关键词
Cross-Language Video Retrieval; User generated content; User generated internet video search; TRACK; NORMALIZATION;
D O I
10.1007/978-3-319-24027-5_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Increasing amounts of user-generated video content are being uploaded to online repositories. This content is often very uneven in quality and topical coverage in different languages. The lack of material in individual languages means that cross-language information retrieval (CLIR) within these collections is required to satisfy the user's information need. Search over this content is dependent on available metadata, which includes user-generated annotations and often noisy transcripts of spoken audio. The effectiveness of CLIR depends on translation quality between query and content languages. We investigate CLIR effectiveness for the blip10000 archive of user-generated Internet video content. We examine the retrieval effectiveness using the title and free-text metadata provided by the uploader and automatic speech recognition (ASR) generated transcripts. Retrieval is carried out using the Divergence From Randomness models, and automatic translation using Google translate. Our experimental investigation indicates that different sources of evidence have different retrieval effectiveness and in particular differing levels of performance in CLIR. Specifically, we find that the retrieval effectiveness of the ASR source is significantly degraded in CLIR. Our investigation also indicates that for this task the Title source provides the most robust source of evidence for CLIR, and performs best when used in combination with other sources of evidence. We suggest areas for investigation to give most effective and robust CLIR performance for user-generated content.
引用
收藏
页码:117 / 129
页数:13
相关论文
共 50 条
  • [41] Comparative evaluation of cross-language information retrieval systems
    Peters, C
    [J]. FROM INTEGRATED PUBLICATION AND INFORMATION SYSTEMS TO VIRTUAL INFORMATION AND KNOWLEDGE ENVIRONMENTS, 2005, 3379 : 152 - 161
  • [42] Effects of Comparable Corpora on Cross-language Information Retrieval
    Sadat, Fatiha
    [J]. NLPCS 2010: NATURAL LANGUAGE PROCESSING AND COGNITIVE SCIENCE, 2010, : 53 - 59
  • [43] Using Lasso RCCA for cross-language information retrieval
    Polajnar, Emil
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (09) : 2739 - 2748
  • [44] Using restricted CCA for cross-language information retrieval
    Polajnar, Emil
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (06) : 4618 - 4626
  • [45] Regular sound changes for cross-language information retrieval
    Oakes, MP
    Banerjee, S
    [J]. COMPARATIVE EVALUATION OF MULTILLINGUAL INFORMATION ACCESS SYSTEMS, 2003, 3237 : 263 - 270
  • [46] Utilisation of Metadata Fields and Query Expansion in Cross-Lingual Search of User-Generated Internet Video
    Khwileh, Ahmad
    Ganguly, Debasis
    Jones, Gareth J. F.
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 55 : 249 - 281
  • [47] Cross-language information access to multilingual collections on the Internet
    Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
    [J]. Journal of the American Society for Information Science and Technology, 2000, 51 (03): : 281 - 296
  • [48] Cross-language information access to multilingual collections on the Internet
    Bian, GW
    Chen, HH
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 2000, 51 (03): : 281 - 296
  • [49] User-Generated Care: The Integration of Internet-Based Health Information
    Waegemann, C. Peter
    Claybrook, Deresa
    Eytan, Ted
    McLeod, Renee P.
    Waldren, Steven E.
    [J]. TELEMEDICINE JOURNAL AND E-HEALTH, 2010, 16 (07): : 764 - 771
  • [50] Multilingual information access system using cross-language information retrieval
    Hayashi, Yoshihiko
    Matsuo, Yoshihiro
    Nagata, Masaaki
    Furuse, Osamu
    [J]. 2003, Nippon Telegraph and Telephone Corp. (52):