Cross-Modality Semantic Integration With Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions

被引:2
|
作者
Hui, Pui-Yu [1 ]
Meng, Helen M. [1 ]
机构
[1] Chinese Univ Hong Kong, Human Comp Commun Lab, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
关键词
Joint integration; human-computer interaction; hypothesis rescoring; multimodal input; pen gesture; perplexity; robust interpretation; spoken input;
D O I
10.1109/TASL.2008.2011509
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We develop a framework pertaining to automatic semantic interpretation of multimodal user interactions using speech and pen gestures. The two input modalities abstract the user's intended message differently into input events, e.g., key terms/phrases in speech or different types of gestures in the pen modality. The proposed framework begins by generating partial interpretations for each input event as a ranked list of hypothesized semantics. We devise a cross-modality semantic integration procedure to align the pair of hypothesis lists between every speech input event and every pen input event in a multimodal expression. This is achieved by the Viterbi alignment algorithm that enforces the temporal ordering of the input events as well as the semantic compatibility of aligned events. The alignment enables generation of a unimodal, verbalized paraphrase that is semantically equivalent to the original multimodal expression. Our experiments are based on a multimodal corpus in the domain of city navigation. Application of the cross-modality integration procedure to near-perfect (manual) transcripts of the speech and pen modalities show that correct unimodal paraphrases are generated for over 97% of the training and test sets. However, if we replace with automatic speech and pen recognition transcripts, the performance drops to 53.7% and 54.8% for the training and test sets, respectively. In order to address this issue, we devised the hypothesis rescoring procedure that evaluates all candidates of cross-modality integration derived from multiple recognition hypotheses from each modality. The rescoring function incorporates the integration score, N-best purity of recognized spoken locative expressions, as well as distances between coordinates of recognized pen gestures and their interpreted icons on the map. Application of cross-modality hypothesis rescoring improved the performance to 67.5% and 69.9% for the training and test sets, respectively.
引用
收藏
页码:486 / 500
页数:15
相关论文
共 50 条
  • [1] SENTENCE AND PICTURE MEMORY - CROSS-MODALITY SEMANTIC INTEGRATION
    PEZDEK, K
    MARSH, G
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1975, 6 (NB4) : 435 - 435
  • [2] CROSS-MODALITY SEMANTIC INTEGRATION OF SENTENCE AND PICTURE MEMORY
    PEZDEK, K
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN LEARNING AND MEMORY, 1977, 3 (05): : 515 - 524
  • [3] Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization
    Zhang, Litian
    Zhang, Xiaoming
    Pan, Junshu
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11676 - 11684
  • [4] CISum: Learning Cross-modality Interaction to Enhance Multimodal Semantic Coverage for Multimodal Summarization
    Zhang, Litian
    Zhang, Xiaoming
    Guo, Ziming
    Liu, Zhipeng
    [J]. PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 370 - 378
  • [5] Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association
    Xu, Nan
    Zeng, Zhixiong
    Mao, Wenji
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3777 - 3786
  • [6] Comparisons of cross-modality integration in midbrain and cortex
    Stein, BE
    Wallace, MT
    [J]. EXTRAGENICULOSTRIATE MECHANISMS UNDERLYING VISUALLY-GUIDED ORIENTATION BEHAVIOR, 1996, 112 : 289 - 299
  • [7] CMOT: Cross-Modality Optimal Transport for multimodal inference
    Sayali Anil Alatkar
    Daifeng Wang
    [J]. Genome Biology, 24
  • [8] CMOT: Cross-Modality Optimal Transport for multimodal inference
    Alatkar, Sayali Anil
    Wang, Daifeng
    [J]. GENOME BIOLOGY, 2023, 24 (01)
  • [9] Cross-Modality Learning by Exploring Modality Interactions for Emotion Reasoning
    Tran, Thi-Dung
    Ho, Ngoc-Huynh
    Pant, Sudarshan
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Gueesang
    [J]. IEEE ACCESS, 2023, 11 : 56634 - 56648
  • [10] CROSS-MODALITY SENSORY INTEGRATION IN THE CONTROL OF FEEDING IN APLYSIA
    ROSEN, SC
    WEISS, KR
    KUPFERMANN, I
    [J]. BEHAVIORAL AND NEURAL BIOLOGY, 1982, 35 (01): : 56 - 63