Cross-Modality Semantic Integration With Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions

被引：2

作者：

Hui, Pui-Yu ^{[1
]}

Meng, Helen M. ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Human Comp Commun Lab, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 03期

关键词：

Joint integration; human-computer interaction; hypothesis rescoring; multimodal input; pen gesture; perplexity; robust interpretation; spoken input;

D O I：

10.1109/TASL.2008.2011509

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We develop a framework pertaining to automatic semantic interpretation of multimodal user interactions using speech and pen gestures. The two input modalities abstract the user's intended message differently into input events, e.g., key terms/phrases in speech or different types of gestures in the pen modality. The proposed framework begins by generating partial interpretations for each input event as a ranked list of hypothesized semantics. We devise a cross-modality semantic integration procedure to align the pair of hypothesis lists between every speech input event and every pen input event in a multimodal expression. This is achieved by the Viterbi alignment algorithm that enforces the temporal ordering of the input events as well as the semantic compatibility of aligned events. The alignment enables generation of a unimodal, verbalized paraphrase that is semantically equivalent to the original multimodal expression. Our experiments are based on a multimodal corpus in the domain of city navigation. Application of the cross-modality integration procedure to near-perfect (manual) transcripts of the speech and pen modalities show that correct unimodal paraphrases are generated for over 97% of the training and test sets. However, if we replace with automatic speech and pen recognition transcripts, the performance drops to 53.7% and 54.8% for the training and test sets, respectively. In order to address this issue, we devised the hypothesis rescoring procedure that evaluates all candidates of cross-modality integration derived from multiple recognition hypotheses from each modality. The rescoring function incorporates the integration score, N-best purity of recognized spoken locative expressions, as well as distances between coordinates of recognized pen gestures and their interpreted icons on the map. Application of cross-modality hypothesis rescoring improved the performance to 67.5% and 69.9% for the training and test sets, respectively.

引用

页码：486 / 500

页数：15

共 50 条

[1] SENTENCE AND PICTURE MEMORY - CROSS-MODALITY SEMANTIC INTEGRATION
PEZDEK, K
MARSH, G
[J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1975, 6 (NB4) : 435 - 435
[2] CROSS-MODALITY SEMANTIC INTEGRATION OF SENTENCE AND PICTURE MEMORY
PEZDEK, K
[J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN LEARNING AND MEMORY, 1977, 3 (05): : 515 - 524
[3] Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization
Zhang, Litian
Zhang, Xiaoming
Pan, Junshu
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11676 - 11684
[4] CISum: Learning Cross-modality Interaction to Enhance Multimodal Semantic Coverage for Multimodal Summarization
Zhang, Litian
Zhang, Xiaoming
Guo, Ziming
Liu, Zhipeng
[J]. PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 370 - 378
[5] Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association
Xu, Nan
Zeng, Zhixiong
Mao, Wenji
[J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3777 - 3786
[6] Comparisons of cross-modality integration in midbrain and cortex
Stein, BE
Wallace, MT
[J]. EXTRAGENICULOSTRIATE MECHANISMS UNDERLYING VISUALLY-GUIDED ORIENTATION BEHAVIOR, 1996, 112 : 289 - 299
[7] CMOT: Cross-Modality Optimal Transport for multimodal inference
Sayali Anil Alatkar
Daifeng Wang
[J]. Genome Biology, 24
[8] CMOT: Cross-Modality Optimal Transport for multimodal inference
Alatkar, Sayali Anil
Wang, Daifeng
[J]. GENOME BIOLOGY, 2023, 24 (01)
[9] Cross-Modality Learning by Exploring Modality Interactions for Emotion Reasoning
Tran, Thi-Dung
Ho, Ngoc-Huynh
Pant, Sudarshan
Yang, Hyung-Jeong
Kim, Soo-Hyung
Lee, Gueesang
[J]. IEEE ACCESS, 2023, 11 : 56634 - 56648
[10] CROSS-MODALITY SENSORY INTEGRATION IN THE CONTROL OF FEEDING IN APLYSIA
ROSEN, SC
WEISS, KR
KUPFERMANN, I
[J]. BEHAVIORAL AND NEURAL BIOLOGY, 1982, 35 (01): : 56 - 63

← 1 2 3 4 5 →