Multi-modal Voice Activity Detection by Embedding Image Features into Speech Signal

被引:0
|
作者
Abe, Yohei [1 ]
Ito, Akinori [1 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan
关键词
multi-modal; audio-visual; information hiding; voice activity detection (VAD);
D O I
10.1109/IIH-MSP.2013.76
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.
引用
收藏
页码:271 / 274
页数:4
相关论文
共 50 条
  • [1] Multi-modal Emotion Recognition using Speech Features and Text Embedding
    Kim, Ju-Hee
    Lee, Seok-Pil
    [J]. Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (01): : 108 - 113
  • [2] Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding
    Byun, Sung-Woo
    Kim, Ju-Hee
    Lee, Seok-Pil
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [3] Kernel Method for Speech Source Activity Detection in Multi-modal Signals
    Dov, David
    Talmon, Ronen
    Cohen, Israel
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING (ICSEE), 2016,
  • [4] Multi-Modal Sarcasm Detection with Sentiment Word Embedding
    Fu, Hao
    Liu, Hao
    Wang, Hongling
    Xu, Linyan
    Lin, Jiali
    Jiang, Dazhi
    [J]. ELECTRONICS, 2024, 13 (05)
  • [5] Multi-Modal Embedding for Main Product Detection in Fashion
    Rubio, Antonio
    Yu, LongLong
    Simo-Serra, Edgar
    Moreno-Noguer, Francesc
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2236 - 2242
  • [6] Multi-Modal Component Embedding for Fake News Detection
    Kang, SeongKu
    Hwang, Junyoung
    Yu, Hwanjo
    [J]. PROCEEDINGS OF THE 2020 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM), 2020,
  • [7] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [8] Face Detection using Multi-modal Features
    Lee, Hyobin
    Kim, Seongwan
    Kim, Sooyeon
    Lee, Sangyoun
    [J]. 2008 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS, VOLS 1-4, 2008, : 1857 - 1860
  • [9] Federated learning and deep learning framework for MRI image and speech signal-based multi-modal depression detection
    Patil, Minakshee
    Mukherji, Prachi
    Wadhai, Vijay
    [J]. Computational Biology and Chemistry, 2024, 113
  • [10] Activity Detection Using Time-Delay Embedding in Multi-modal Sensor System
    Kawsar, Ferdaus
    Hasan, Md. Kamrul
    Roushan, Tanvir
    Ahamed, Sheikh Iqbal
    Chu, William C.
    Love, Richard
    [J]. INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 489 - 499