A Noun-Centric Keyphrase Extraction Model: Graph-Based Approach

被引:2
|
作者
Abimbola, Rilwan O. [1 ]
Awoyelu, Iyabo O. [2 ]
Hunsu, Folasade O. [3 ,4 ]
Akinyemi, Bodunde O.
Aderounmu, Ganiyu A. [5 ]
机构
[1] First Tech Univ, Ibadan, Nigeria
[2] Obafemi Awolowo Univ, Dept Comp Sci & Engn, Ife, Nigeria
[3] Obafemi Awolowo Univ, Dept English, Ife, Nigeria
[4] Obafemi Awolowo Univ, Dept Comp Sci & Engn, Data Commun Grp, Ife, Nigeria
[5] Obafemi Awolowo Univ, Comp Sci & Engn, Ife, Nigeria
关键词
keyphrase; keyphrase extraction; noun-centric; graph-based model; clustering; KEYWORD EXTRACTION; DOCUMENT;
D O I
10.12720/jait.13.6.578-589
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The graph-based approach has proven to be the most effective method of extracting keyphrases. Existing graph-based extraction methods do not include nouns as a component, resulting in keyphrases that are not noun-centric, leading to low-quality keyphrases. Also, the clustering approach employed in most of the keyphrase extraction has not yielded good results. This study proposed an improved model for extracting keyphrases that uses a graph-based model with noun phrase identifiers and effective clustering techniques. Relevant data was collected from selected documents in the English language. A graph-based model was formulated by integrating the textrank algorithm for node ranking, a noun phrase identifier for noun phrase scoring, an affinity propagation algorithm for selecting cluster groups, and k-means for clustering. The formulated model was implemented and evaluated by benchmarking it with an existing model using recall, f-measure, and precision as performance metrics. Final results showed that the developed model has a higher precision of 5.5%, a recall of 5.3%, and an f-measure score of 5.5% over the existing model. This implied that the noun-centric keyphrase extraction ensured high-quality keyphrase extraction.
引用
收藏
页码:578 / 589
页数:12
相关论文
共 50 条
  • [31] Sequential graph-based extraction of curvilinear structures
    Shuaa S. Alharbi
    Chris G. Willcocks
    Philip T. G. Jackson
    Haifa F. Alhasson
    Boguslaw Obara
    [J]. Signal, Image and Video Processing, 2019, 13 : 941 - 949
  • [32] Sequential graph-based extraction of curvilinear structures
    Alharbi, Shuaa S.
    Willcocks, Chris G.
    Jackson, Philip T. G.
    Alhasson, Haifa F.
    Obara, Boguslaw
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2019, 13 (05) : 941 - 949
  • [33] GraphIE: A Graph-Based Framework for Information Extraction
    Qian, Yujie
    Santos, Enrico
    Jin, Zhijing
    Guo, Jiang
    Barzilay, Regina
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 751 - 761
  • [34] A Way to Improve Graph-Based Keyword Extraction
    Cao, Jian
    Jiang, Zhiheng
    Huang, May
    Wang, Karl
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 166 - 170
  • [35] Graph-based Document Representation for Relation Extraction
    Cabaleiro, Bernardo
    Penas, Anselmo
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2012, (49): : 57 - 64
  • [36] Graph-based Bayesian Meta Relation Extraction
    Wang, Zhen
    Zhang, Zhenting
    [J]. 2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 90 - 94
  • [37] A GRAPH-BASED APPROACH TO SURFACE RECONSTRUCTION
    MENCL, R
    [J]. COMPUTER GRAPHICS FORUM, 1995, 14 (03) : C445 - C456
  • [38] Graph-based traceability: a comprehensive approach
    Schwarz, Hannes
    Ebert, Juergen
    Winter, Andreas
    [J]. SOFTWARE AND SYSTEMS MODELING, 2010, 9 (04): : 473 - 492
  • [39] Graph-based traceability: a comprehensive approach
    Hannes Schwarz
    Jürgen Ebert
    Andreas Winter
    [J]. Software & Systems Modeling, 2010, 9 : 473 - 492
  • [40] A graph-based approach to feature selection
    Zhang Z.
    Hancock E.R.
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, 6658 LNCS : 205 - 214