Enhanced thesaurus terms extraction for document indexing

被引:0
|
作者
Saric, F [1 ]
Snajder, J [1 ]
Basic, BD [1 ]
Eklic, H [1 ]
机构
[1] Univ Zagreb, Fac Elect & Comp Engn, Zagreb 10000, Croatia
关键词
information retrieval; term extraction; NLP; lemmatisation; Eurovoc;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present an enhanced method for the thesaurus term extraction regarded as the main support to a semi-automatic indexing system. The enhancement is achieved by neutralising the effect of language morphology applying lemmatisation on both the text and the thesaurus, and by implementing an efficient recursive algorithm for term extraction. Formal definition and statistical evaluation of the experimental results of the proposed method for thesaurus term extraction are given. The need for disambiguation methods and the effect of lemmatisation in the realm of thesaurus term extraction are discussed.
引用
收藏
页码:227 / 232
页数:6
相关论文
共 50 条