A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction

被引:1
|
作者
Kumar, Niraj [1 ]
Srinathan, Kannan [1 ]
Varma, Vasudeva [1 ]
机构
[1] IIIT Hyderabad, Hyderabad 500032, Andhra Pradesh, India
关键词
keyphrase extraction; weighted betweenness centrality; N-gram graph; normalised pointwise mutual information; NPMI;
D O I
10.1504/IJDMMM.2016.077198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel N-gram (N>=1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combined use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which is generally used to identify the border nodes/edges in community detection techniques); 2) co-location strength (calculated by using nearest neighbour Dbpedia texts). We also introduce the use of N-gram (N>=1) graph, which reduces the bias effect of lower length N-grams in the ranking process and preserves the semantics of words (phraseness), based upon local context. To capture the theme of the document and to reduce the effect of noisy terms in the ranking process, we apply an information theoretic framework for key-player detection on the proposed N-gram graph. Our experimental results show that the devised system performs better than the current state-of-the-art unsupervised systems and comparable/better than supervised systems.
引用
收藏
页码:124 / 143
页数:20
相关论文
共 50 条
  • [31] Automatic Composition System based on Genetic Algorithm and N-gram Model
    Tomari, Manabu
    Sato, Masayuki
    Osana, Yuko
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 202 - +
  • [32] N-gram modeling based on recognized phonemes in automatic language identification
    Kwan, H
    Hirose, K
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1998, E81D (11) : 1224 - 1231
  • [33] An n-gram based approach to the automatic classification of schoolchildren's writing
    Cicres, Jordi
    Queralt, Sheila
    VIAL-VIGO INTERNATIONAL JOURNAL OF APPLIED LINGUISTICS, 2019, 16 : 53 - 80
  • [34] Learning from Twitter Hashtags: Leveraging Proximate Tags to Enhance Graph-based Keyphrase Extraction
    Bellaachia, Abdelghani
    Al-Dhelaan, Mohammed
    2012 IEEE INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND COMMUNICATIONS, CONFERENCE ON INTERNET OF THINGS, AND CONFERENCE ON CYBER, PHYSICAL AND SOCIAL COMPUTING (GREENCOM 2012), 2012, : 348 - 357
  • [35] Keyphrase Distance Analysis Technique from News Articles as a Feature for Keyphrase Extraction: An Unsupervised Approach
    Miah, Mohammad Badrul Alam
    Awang, Suryanti
    Rahman, Md Mustafizur
    Hosen, A. S. M. Sanwar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 995 - 1002
  • [36] N-Gram based Assamese Question Pattern Extraction and Probabilistic Modelling
    Chakraborty, Rita
    Sarma, Shikhar Kr.
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 712 - 726
  • [37] Apriori and N-gram Based Chinese Text Feature Extraction Method
    王晔
    黄上腾
    Journal of Shanghai Jiaotong University, 2004, (04) : 11 - 14
  • [38] Sentiment Analysis Using N-gram Technique
    Chidananda, Himadri Tanaya
    Das, Debashis
    Sagnika, Santwana
    PROGRESS IN COMPUTING, ANALYTICS AND NETWORKING, ICCAN 2017, 2018, 710 : 359 - 367
  • [39] UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 857 - 860
  • [40] Searching Polyphonic Indonesian Folksongs Based on N-gram Indexing Technique
    Marsye, Aurora
    Adriani, Mirna
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 387 - 396