Speech-Based Annotation and Retrieval of Digital Photographs

被引：0

作者：

Hazen, Timothy J. ^{[1
]}

Sherry, Brennan ^{[1
]}

Adler, Mark ^{[2
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[2] Nokia Res Ctr, Cambridge, MA USA

来源：

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | 2007年

关键词：

photo annotation; audio indexing; audio retrieval;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we describe the development of a speech-based annotation and retrieval system for digital photographs. The system uses a client/server architecture which allows photographs to be captured and annotated on light-weight clients, such as mobile camera phones, and then processed, indexed and stored on networked servers. For speech-based retrieval we have developed a mixed grammar recognition approach which allows the speech recognition system to construct a single finite-state network combining context-free grammars, for recognizing and parsing query carrier phrases and metadata phrases, with an unconstrained statistical n-gram model for recognizing free-form search terms. Experiments demonstrating successful retrieval of photographs using purely speech-based annotation and retrieval are presented.

引用

页码：2077 / +

页数：2

共 50 条

[1] User interfaces for speech-based retrieval of lecture recordings
Hürst, W
[J]. ED-MEDIA 2004: World Conference on Educational Multimedia, Hypermedia & Telecommunications, Vols. 1-7, 2004, : 4470 - 4477
[2] Region-Based Annotation of Digital Photographs
Cusano, Claudio
[J]. COMPUTATIONAL COLOR IMAGING, 2011, 6626 : 47 - 59
[3] Temporal Confusion Network for Speech-based Soccer Event Retrieval
Pham, Nhut M.
Vu, Quan H.
[J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2013, : 549 - 553
[4] Using catalogue browsing for speech-based interface to a digital library
Dubinsky, Yael
Catarci, Tiziana
Kimani, Stephen
[J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION, 2007, : 130 - +
[5] Speech-based services
Furman, DS
Cosky, MJ
Thomson, DL
O'Brien, SA
Sumner, EE
[J]. BELL LABS TECHNICAL JOURNAL, 1999, 4 (02) : 88 - 97
[6] Multimodal video search techniques: Late fusion of speech-based retrieval and visual content-based retrieval
Amir, A
Iyengar, G
Lin, CY
Naphade, M
Natsev, A
Neti, C
Nock, HJ
Smith, JR
Tseng, B
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 1048 - 1051
[7] Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Sanabria, Ramon
Waters, Austin
Baldridge, Jason
[J]. INTERSPEECH 2021, 2021, : 2976 - 2980
[8] WHITMAN AND SPEECH-BASED PROSODY
JARVIS, DR
[J]. WALT WHITMAN REVIEW, 1981, 27 (02): : 51 - 62
[9] Speech-based Class Attendance
Amri, Umar Faizel
Hashim, Nik Nur Wahidah Nik
Hanif, Noor Hazrin Hany Mohamad
[J]. 6TH INTERNATIONAL CONFERENCE ON MECHATRONICS (ICOM'17), 2017, 260
[10] Speech-Based Meaning of Music
Karbanova, Alice
[J]. PROCEEDINGS OF 27TH INTERNATIONAL SYMPOSIUM ON FRONTIERS OF RESEARCH IN SPEECH AND MUSIC, FRSM 2023, 2024, 1455 : 385 - 397

← 1 2 3 4 5 →