Modeling the Statistical Behavior of Lexical Chains to Capture Word Cohesiveness for Automatic Story Segmentation

被引:0
|
作者
Chan, Shing-kai [1 ]
Xie, Lei [1 ]
Meng, Helen Mei-ling [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Commun Lab, Shatin, Hong Kong, Peoples R China
关键词
story segmentation; spoken document retrieval; Chinese;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a mathematically rigorous framework for modeling the statistical behavior of lexical chains for automatic story segmentation of broadcast news audio. Lexical chains were first proposed in [1] to connect related terms within a story, as an embodiment of lexical cohesion. The vocabulary within a story tends to be cohesive, while a change in the vocabulary distribution tends to signify a topic shift that occurs across a story boundary. Previous work focused on the concept and nature of lexical chains but performed story segmentation based on arbitrary thresholding. This work proposes the use of the log-normal distribution to capture the statistical behavior of lexical chains, together with data-driven parameter selection for lexical chain formation. Experimentation based on the TDT-2 Mandarin Corpus shows that the proposed statistical model leads to better story segmentation, where the F1-measure increased from 0.468 to 0.641.
引用
收藏
页码:2408 / 2411
页数:4
相关论文
共 7 条
  • [1] Modeling human performance in statistical word segmentation
    Frank, Michael C.
    Goldwater, Sharon
    Griffiths, Thomas L.
    Tenenbaum, Joshua B.
    COGNITION, 2010, 117 (02) : 107 - 125
  • [2] Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News
    Xie, Lei
    Yang, Yulian
    Zeng, Jia
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2008, 9TH PACIFIC RIM CONFERENCE ON MULTIMEDIA, 2008, 5353 : 248 - +
  • [3] SegChainW2V: Towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions and word embeddings
    Chifu, Adrian-Gabriel
    Fournier, Sebastien
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS: PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE KES-2016, 2016, 96 : 1371 - 1380
  • [4] Statistical Modeling for Quantitative Evaluation of Automatic Anatomy Segmentation in Radiotherapy
    Yang, J.
    Zhang, L.
    Zhang, Y.
    Dong, L.
    MEDICAL PHYSICS, 2011, 38 (06)
  • [5] Initial experiments on automatic story segmentation in Chinese spoken documents using lexical cohesion of extracted named entities
    Li, Devon
    Lo, Wai-Kit
    Meng, Helen
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 693 - +
  • [6] Automatic segmentation and statistical shape modeling of the paranasal sinuses to estimate natural variations
    Sinha, Ayushi
    Leonard, Simon
    Reiter, Austin
    Ishii, Masaru
    Taylor, Russell H.
    Hager, Gregory D.
    MEDICAL IMAGING 2016: IMAGE PROCESSING, 2016, 9784
  • [7] Automatic Data Segmentation based on Statistical Hypothesis Testing for Stochastic Channel Modeling
    Tian, Li
    Yin, Xuefeng
    Lu, Stan X.
    2010 IEEE 21ST INTERNATIONAL SYMPOSIUM ON PERSONAL INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2010, : 741 - 745