Construction of Word Network from Traditional Chinese Medicine Corpus

被引:0
|
作者
Cha, Hua [1 ]
Lu, Haiming [1 ]
Yu, Tong [2 ]
机构
[1] Tsinghua Univ, Res Inst Informat Technol, Beijing, Peoples R China
[2] China Acad Chinese Med Sci, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In this paper, we created an automatic quanticized traditional Chinese medicine (TCM) term network with the measurement of cosine distance. After scanning over the corpus, we got a set of word vectors whose relationships could be measured. After clustering, we obtained a three-level network as a category tree. Leaves stand for different types of words and we got clusters like herbs, diseases, theories of medicine etc. Of all categories, we selected words nearest to the center of each cluster and invited our experts to evaluate whether a word is a correct uncollected TCM term and got a new word extraction rate of around 70%. Our network was almost completely machine-generated so that it is much more efficient and might lead us to several new approaches of TCM with the knowledge from our network.
引用
收藏
页码:143 / 147
页数:5