Named Entities as Privileged Information for Hierarchical Text Clustering

被引:7
|
作者
Sinoara, Roberta A. [1 ]
Sundermann, Camila V. [1 ]
Marcacini, Ricardo M. [2 ]
Domingues, Marcos A. [1 ]
Rezende, Solange O. [1 ]
机构
[1] Univ Sao Paulo, ICMC USP, POB 668, BR-13561970 Sao Carlos, SP, Brazil
[2] Univ Fed Mato Grosso Sul UFMS, BR-79603011 Tres Lagoas, MS, Brazil
基金
巴西圣保罗研究基金会;
关键词
Text Clustering; Named Entities; Privileged Information;
D O I
10.1145/2628194.2628225
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Text clustering is a text mining task which is often used to aid the organization, knowledge extraction, and exploratory search of text collections. Nowadays, the automatic text clustering becomes essential as the volume and variety of digital text documents increase, either in social networks and the Web or inside organizations. This paper explores the use of named entities as privileged information in a hierarchical clustering process, so as to improve clusters quality and interpretation. We carried out an experimental evaluation on three text collections (one written in Portuguese and two written in English) and the results show that named entities can be applied as privileged information to power clustering solution in dynamic text collection scenarios.
引用
收藏
页码:57 / 66
页数:10
相关论文
共 50 条
  • [1] Processing Named Entities in Text
    McNamee, Paul
    Mayfield, James C.
    Piatko, Christine D.
    [J]. JOHNS HOPKINS APL TECHNICAL DIGEST, 2011, 30 (01): : 31 - 40
  • [2] Privileged Information for Hierarchical Document Clustering: A Metric Learning Approach
    Marcacini, Ricardo M.
    Domingues, Marcos A.
    Hruschka, Eduardo R.
    Rezende, Solange O.
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3636 - 3641
  • [3] A Probabilistic Model for Linking Named Entities in Web Text with Heterogeneous Information Networks
    Shen, Wei
    Han, Jiawei
    Wang, Jianyong
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1199 - 1210
  • [4] Building Language Models for Text with Named Entities
    Parvez, Md Rizwan
    Chakraborty, Saikat
    Ray, Baishakhi
    Chang, Kai-Wei
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2373 - 2383
  • [5] Locating Complex Named Entities in Web Text
    Downey, Doug
    Broadhead, Matthew
    Etzioni, Oren
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2733 - 2739
  • [6] Exploiting Named Entities for Bilingual News Clustering
    Montalvo, Soto
    Martinez, Raquel
    Fresno, Victor
    Delgado, Agustin
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (02) : 363 - 376
  • [7] Semantic Clustering of Relations between Named Entities
    Wang, Wei
    Besancon, Romaric
    Ferret, Olivier
    Grau, Brigitte
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2014, 8686 : 358 - +
  • [8] Suggesting named entities for information access
    Amigó, E
    Peñas, A
    Gonzalo, J
    Verdejo, F
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 557 - 561
  • [9] Privileged information for data clustering
    Feyereisl, Jan
    Aickelin, Uwe
    [J]. INFORMATION SCIENCES, 2012, 194 : 4 - 23
  • [10] Clustering Prominent Named Entities in Topic-Specific Text Corpora Completed Research Full Papers
    Alsudais, Abdulkareem
    Tchalian, Hovig
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,