Modeling the Statistical Behavior of Lexical Chains to Capture Word Cohesiveness for Automatic Story Segmentation

被引：0

作者：

Chan, Shing-kai ^{[1
]}

Xie, Lei ^{[1
]}

Meng, Helen Mei-ling ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Commun Lab, Shatin, Hong Kong, Peoples R China

来源：

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | 2007年

关键词：

story segmentation; spoken document retrieval; Chinese;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a mathematically rigorous framework for modeling the statistical behavior of lexical chains for automatic story segmentation of broadcast news audio. Lexical chains were first proposed in [1] to connect related terms within a story, as an embodiment of lexical cohesion. The vocabulary within a story tends to be cohesive, while a change in the vocabulary distribution tends to signify a topic shift that occurs across a story boundary. Previous work focused on the concept and nature of lexical chains but performed story segmentation based on arbitrary thresholding. This work proposes the use of the log-normal distribution to capture the statistical behavior of lexical chains, together with data-driven parameter selection for lexical chain formation. Experimentation based on the TDT-2 Mandarin Corpus shows that the proposed statistical model leads to better story segmentation, where the F1-measure increased from 0.468 to 0.641.

引用

页码：2408 / 2411

页数：4

共 7 条

[1] Modeling human performance in statistical word segmentation
Frank, Michael C.
Goldwater, Sharon
Griffiths, Thomas L.
Tenenbaum, Joshua B.
COGNITION, 2010, 117 (02) : 107 - 125
[2] Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News
Xie, Lei
Yang, Yulian
Zeng, Jia
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2008, 9TH PACIFIC RIM CONFERENCE ON MULTIMEDIA, 2008, 5353 : 248 - +
[3] SegChainW2V: Towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions and word embeddings
Chifu, Adrian-Gabriel
Fournier, Sebastien
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS: PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE KES-2016, 2016, 96 : 1371 - 1380
[4] Statistical Modeling for Quantitative Evaluation of Automatic Anatomy Segmentation in Radiotherapy
Yang, J.
Zhang, L.
Zhang, Y.
Dong, L.
MEDICAL PHYSICS, 2011, 38 (06)
[5] Initial experiments on automatic story segmentation in Chinese spoken documents using lexical cohesion of extracted named entities
Li, Devon
Lo, Wai-Kit
Meng, Helen
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 693 - +
[6] Automatic segmentation and statistical shape modeling of the paranasal sinuses to estimate natural variations
Sinha, Ayushi
Leonard, Simon
Reiter, Austin
Ishii, Masaru
Taylor, Russell H.
Hager, Gregory D.
MEDICAL IMAGING 2016: IMAGE PROCESSING, 2016, 9784
[7] Automatic Data Segmentation based on Statistical Hypothesis Testing for Stochastic Channel Modeling
Tian, Li
Yin, Xuefeng
Lu, Stan X.
2010 IEEE 21ST INTERNATIONAL SYMPOSIUM ON PERSONAL INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2010, : 741 - 745

← 1 →