Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News

被引:0
|
作者
Xie, Lei [1 ]
Yang, Yulian [1 ]
Zeng, Jia [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP, Xian 710072, Peoples R China
[2] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Story segmentation; topic segmentation; spoken document retrieval; multimedia; Chinese;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a subword lexical chaining approach to automatic story segmentation of Chinese broadcast news (BN). Conventional lexical chains link related words with cohesion (e.g. repetition of words) and high concentration points of starting and ending chains are indicative of story boundaries. However, inevitable speech recognition errors in BN transcripts may destroy the cohesiveness of words, resulting in word match failures. We show the robustness of Chinese subwords (characters and syllables) in lexical matching in errorful ASR transcripts. This motivates us to discover story boundaries on chains formed by character and syllable n-gram units. Experimental results on the TDT2 Mandarin corpus show that chaining by character unigram exhibits the best story segmentation performance with relative F-measure improvement of 6.06% over conventional word chaining. Integrations of multi-scales (words and subwords) exhibit further improvement. For example, fusion by voting from different scales achieves an F-measure gain of 9.04% over words.
引用
收藏
页码:248 / +
页数:3
相关论文
共 50 条
  • [41] Maximum entropy segmentation of broadcast news
    Christensen, H
    Kolluru, BK
    Gotoh, Y
    Renals, S
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1029 - 1032
  • [42] Modeling the Statistical Behavior of Lexical Chains to Capture Word Cohesiveness for Automatic Story Segmentation
    Chan, Shing-kai
    Xie, Lei
    Meng, Helen Mei-ling
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2408 - 2411
  • [43] Broadcast news gisting using lexical cohesion analysis
    Stokes, N
    Newman, E
    Carthy, J
    Smeaton, AF
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2004, 2997 : 209 - 222
  • [44] Segmenting broadcast news streams using lexical chains
    Stokes, N
    Carthy, J
    Smeaton, AF
    STAIRS 2002, PROCEEDINGS, 2002, 78 : 145 - 154
  • [45] Automatic language identification in broadcast news
    Backfried, G
    Rainoldi, R
    Riedler, J
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 1406 - 1410
  • [46] Automatic categorization design for broadcast news
    Luo, HT
    Huang, Q
    STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2002, 2002, 4676 : 285 - 295
  • [47] Subword Encoding in Lattice LSTM for Chinese Word Segmentation
    Yang, Jie
    Zhang, Yue
    Liang, Shuailong
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2720 - 2725
  • [48] Automatic transcription of Broadcast News data
    Pallett, DS
    Lamel, L
    SPEECH COMMUNICATION, 2002, 37 (1-2) : 1 - 2
  • [49] Story segmentation in news video
    Feng, HM
    Zhai, XF
    Fan, JW
    Fang, Y
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 831 - 835
  • [50] News video story segmentation
    Fang, Yong
    Zhai, Xiaofei
    Fan, Jingwang
    12TH INTERNATIONAL MULTI-MEDIA MODELLING CONFERENCE PROCEEDINGS, 2006, : 397 - 400