Speech-Based Annotation and Retrieval of Digital Photographs

被引:0
|
作者
Hazen, Timothy J. [1 ]
Sherry, Brennan [1 ]
Adler, Mark [2 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Nokia Res Ctr, Cambridge, MA USA
关键词
photo annotation; audio indexing; audio retrieval;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe the development of a speech-based annotation and retrieval system for digital photographs. The system uses a client/server architecture which allows photographs to be captured and annotated on light-weight clients, such as mobile camera phones, and then processed, indexed and stored on networked servers. For speech-based retrieval we have developed a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars, for recognizing and parsing query carrier phrases and metadata phrases, with an unconstrained statistical n-gram model for recognizing free-form search terms. Experiments demonstrating successful retrieval of photographs using purely speech-based annotation and retrieval are presented.
引用
收藏
页码:2077 / +
页数:2
相关论文
共 50 条
  • [1] User interfaces for speech-based retrieval of lecture recordings
    Hürst, W
    [J]. ED-MEDIA 2004: World Conference on Educational Multimedia, Hypermedia & Telecommunications, Vols. 1-7, 2004, : 4470 - 4477
  • [2] Region-Based Annotation of Digital Photographs
    Cusano, Claudio
    [J]. COMPUTATIONAL COLOR IMAGING, 2011, 6626 : 47 - 59
  • [3] Temporal Confusion Network for Speech-based Soccer Event Retrieval
    Pham, Nhut M.
    Vu, Quan H.
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2013, : 549 - 553
  • [4] Using catalogue browsing for speech-based interface to a digital library
    Dubinsky, Yael
    Catarci, Tiziana
    Kimani, Stephen
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION, 2007, : 130 - +
  • [5] Speech-based services
    Furman, DS
    Cosky, MJ
    Thomson, DL
    O'Brien, SA
    Sumner, EE
    [J]. BELL LABS TECHNICAL JOURNAL, 1999, 4 (02) : 88 - 97
  • [6] Multimodal video search techniques: Late fusion of speech-based retrieval and visual content-based retrieval
    Amir, A
    Iyengar, G
    Lin, CY
    Naphade, M
    Natsev, A
    Neti, C
    Nock, HJ
    Smith, JR
    Tseng, B
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 1048 - 1051
  • [7] Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
    Sanabria, Ramon
    Waters, Austin
    Baldridge, Jason
    [J]. INTERSPEECH 2021, 2021, : 2976 - 2980
  • [8] WHITMAN AND SPEECH-BASED PROSODY
    JARVIS, DR
    [J]. WALT WHITMAN REVIEW, 1981, 27 (02): : 51 - 62
  • [9] Speech-based Class Attendance
    Amri, Umar Faizel
    Hashim, Nik Nur Wahidah Nik
    Hanif, Noor Hazrin Hany Mohamad
    [J]. 6TH INTERNATIONAL CONFERENCE ON MECHATRONICS (ICOM'17), 2017, 260
  • [10] Speech-Based Meaning of Music
    Karbanova, Alice
    [J]. PROCEEDINGS OF 27TH INTERNATIONAL SYMPOSIUM ON FRONTIERS OF RESEARCH IN SPEECH AND MUSIC, FRSM 2023, 2024, 1455 : 385 - 397