An Approach for Text Mining Based on Noun Phrases

被引:0
|
作者
Pinheiro, Marcello Sandi [1 ]
do Prado, Hercules Antonio [2 ,3 ]
Ferneda, Edilson [2 ]
Ladeira, Marcelo [4 ]
机构
[1] Brazilian Army CDS, QGEx Setor Mil Urbano, BR-70630904 Brasilia, DF, Brazil
[2] Univ Brasilia, Grad Program Knowledge & IT Management Catholic, SGAN 916 Av W5, BR-70790160 Brasilia, DF, Brazil
[3] Embrapa, Management & Strategy Secretariat Parque Estacao, BR-7077090 Brasilia, DF, Brazil
[4] Univ Brasilia, IE, BR-70910900 Brasilia, DF, Brazil
来源
关键词
Text mining; Natural language processing; Preprocessing; Noun phrases;
D O I
10.1007/978-3-319-19857-6_45
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of noun phrases as descriptors for text mining vectors has been proposed to overcome the poor semantic of the traditional bag-of-words (BOW). However, the solutions found in the literature are unsatisfactory, mainly due to the use of static definitions for noun phrases and the fact that noun phrases per se do not enable an adequate relevance representation since they are expressions that barely repeat. We present an approach to deal with these problems by (i) introducing a process that enables the definition of noun phrases interactively and (ii) considering similar noun phrases as a unique term. A case study compares both approaches, the one proposed in this paper and the other based on BOW. The main contribution of this paper is the improvement of the preprocessing phase of text mining, leading to better results in the overall process.
引用
收藏
页码:525 / 535
页数:11
相关论文
共 50 条