A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction

被引:1
|
作者
Kumar, Niraj [1 ]
Srinathan, Kannan [1 ]
Varma, Vasudeva [1 ]
机构
[1] IIIT Hyderabad, Hyderabad 500032, Andhra Pradesh, India
关键词
keyphrase extraction; weighted betweenness centrality; N-gram graph; normalised pointwise mutual information; NPMI;
D O I
10.1504/IJDMMM.2016.077198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel N-gram (N>=1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combined use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which is generally used to identify the border nodes/edges in community detection techniques); 2) co-location strength (calculated by using nearest neighbour Dbpedia texts). We also introduce the use of N-gram (N>=1) graph, which reduces the bias effect of lower length N-grams in the ranking process and preserves the semantics of words (phraseness), based upon local context. To capture the theme of the document and to reduce the effect of noisy terms in the ranking process, we apply an information theoretic framework for key-player detection on the proposed N-gram graph. Our experimental results show that the devised system performs better than the current state-of-the-art unsupervised systems and comparable/better than supervised systems.
引用
收藏
页码:124 / 143
页数:20
相关论文
共 50 条
  • [21] SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation
    Alrehamy, Hassan H.
    Walker, Coral
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 650 : 222 - 235
  • [22] A Fuzzy Approach to Improve an Unsupervised Automatic Keyphrase Extraction Process
    Perez-Guadarrama, Yamel
    Simon-Cuevas, Alfredo
    Hojas-Mazo, Wenny
    Olivas, Jose A.
    Romero, Francisco P.
    2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
  • [23] HAKE: an Unsupervised Approach to Automatic Keyphrase Extraction for Multiple Domains
    Zakariae Alami Merrouni
    Bouchra Frikh
    Brahim Ouhbi
    Cognitive Computation, 2022, 14 : 852 - 874
  • [24] Unsupervised graph-based pattern extraction for multilingual emotion classification
    Saravia, Elvis
    Argueta, Carlos
    Chen, Yi-Shin
    SOCIAL NETWORK ANALYSIS AND MINING, 2016, 6 (01)
  • [25] RankUp: Enhancing graph-based keyphrase extraction methods with error-feedback propagation
    Figueroa, Gerardo
    Chen, Po-Chi
    Chen, Yi-Shin
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 112 - 131
  • [26] Unsupervised word sense disambiguation with N-gram features
    Daniel Preotiuc-Pietro
    Florentina Hristea
    Artificial Intelligence Review, 2014, 41 : 241 - 260
  • [27] Unsupervised word sense disambiguation with N-gram features
    Preotiuc-Pietro, Daniel
    Hristea, Florentina
    ARTIFICIAL INTELLIGENCE REVIEW, 2014, 41 (02) : 241 - 260
  • [28] A Keyphrase Graph-Based Method for Document Similarity Measurement
    Huynh, ThanhThuong T.
    TruongAn PhamNguyen
    Do, Nhon, V
    ENGINEERING LETTERS, 2022, 30 (02) : 692 - 710
  • [29] Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method
    Yeom, Hongseon
    Ko, Youngjoong
    Seo, Jungyun
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 304 - 318
  • [30] DERIN: A data extraction information and n-gram
    Lopes Figueiredo, Leandro Neiva
    de Assis, Guilherme Tavares
    Ferreira, Anderson A.
    INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (05) : 1120 - 1138