A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction

被引:1
|
作者
Kumar, Niraj [1 ]
Srinathan, Kannan [1 ]
Varma, Vasudeva [1 ]
机构
[1] IIIT Hyderabad, Hyderabad 500032, Andhra Pradesh, India
关键词
keyphrase extraction; weighted betweenness centrality; N-gram graph; normalised pointwise mutual information; NPMI;
D O I
10.1504/IJDMMM.2016.077198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel N-gram (N>=1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combined use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which is generally used to identify the border nodes/edges in community detection techniques); 2) co-location strength (calculated by using nearest neighbour Dbpedia texts). We also introduce the use of N-gram (N>=1) graph, which reduces the bias effect of lower length N-grams in the ranking process and preserves the semantics of words (phraseness), based upon local context. To capture the theme of the document and to reduce the effect of noisy terms in the ranking process, we apply an information theoretic framework for key-player detection on the proposed N-gram graph. Our experimental results show that the devised system performs better than the current state-of-the-art unsupervised systems and comparable/better than supervised systems.
引用
收藏
页码:124 / 143
页数:20
相关论文
共 50 条
  • [41] DegExt - A Language-Independent Graph-Based Keyphrase Extractor
    Litvak, Marina
    Last, Mark
    Aizenman, Hen
    Gobits, Inbal
    Kandel, Abraham
    ADVANCES IN INTELLIGENT WEB MASTERING 3, 2011, 86 : 121 - +
  • [42] A Corpus Based Unsupervised Bangla Word Stemming Using N-Gram Language Model
    Urmi, Tapashee Tabassum
    Jammy, Jasmine Jahan
    Ismail, Sabir
    2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 824 - 828
  • [43] Knowledge-driven Unsupervised Skills Extraction for Graph-based Talent Matching
    Konstantinidis, Ioannis
    Maragoudakis, Manolis
    Magnisalis, Ioannis
    Berberidis, Christos
    Peristeras, Vassilios
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [44] BINARY N-GRAM TECHNIQUE FOR AUTOMATIC CORRECTION OF SUBSTITUTION, DELETION, INSERTION AND REVERSAL ERRORS IN WORDS
    ULLMANN, JR
    COMPUTER JOURNAL, 1977, 20 (02): : 141 - 147
  • [45] The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification
    Setiawan Y.
    Maulidevi N.U.
    Surendro K.
    Data Science Journal, 2024, 23 (01)
  • [46] News Thread Extraction Based on Topical N-Gram Model with a Background Distribution
    Yan, Zehua
    Li, Fang
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 416 - 424
  • [47] LANGUAGE IDENTIFICATION BASED ON N-GRAM FEATURE EXTRACTION METHOD BY USING CLASSIFIERS
    Bayrak Hayta, Sengul
    Takci, Hidayet
    Eminli, Mubariz
    ISTANBUL UNIVERSITY-JOURNAL OF ELECTRICAL AND ELECTRONICS ENGINEERING, 2013, 13 (02): : 1629 - 1638
  • [48] Arithmetic N-gram: an efficient data compression technique
    Hassan, Ali
    Javed, Sadaf
    Hussain, Sajjad
    Ahmad, Rizwan
    Qazi, Shams
    DISCOVER COMPUTING, 2024, 27 (01)
  • [49] Chinese keyword extraction based on N-gram and word co-occurrence
    Jiao, Hui
    Liu, Qian
    Jia, Hui-bo
    CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 152 - +
  • [50] Intelligent Assessment Using Variable N-gram Technique
    Kar, Sadhu Prasad
    Chatterjee, Rajeev
    Mandal, Jyotsna Kumar
    IMPACT OF THE 4TH INDUSTRIAL REVOLUTION ON ENGINEERING EDUCATION, ICL2019, VOL 2, 2020, 1135 : 30 - 37