Automatic keyphrase extraction for Arabic news documents based on KEA system

被引:7
|
作者
Duwairi, Rehab [1 ]
Hedaya, Mona [2 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Informat Syst, Irbid 22110, Jordan
[2] Qatar Univ, Coll Engn, Dept Comp Sci & Engn, Doha, Qatar
关键词
Keyphrase extraction; term indexing; document summarization; document classification; Arabic web content;
D O I
10.3233/IFS-151923
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document indexing, document retrieval, search engines, and document summarization. This paper presents a framework for extracting keyphrases from Arabic news documents which is based on the KEA system. It relies on supervised learning, Naive Bayes in particular, to extract keyphrases. Two probabilities are computed: the probability of being a keyphrase and the probability of not being a keyphrase. The final set of keyphrases is chosen from the set of phrases that have high probabilities of being keyphrases. The novel contributions of the current work are that it provides insights on keyphrase extraction for news documents written in Arabic. It also presents an annotated dataset that was used in the experimentation. Finally, it uses Naive Bayes as a medium for extracting keyphrases.
引用
收藏
页码:2101 / 2110
页数:10
相关论文
共 50 条
  • [1] Automatic keyphrase extraction from chinese news documents
    Wang, HF
    Li, SJ
    Yu, SW
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 648 - 657
  • [2] An automatic keyphrase extraction system for scientific documents
    Wei You
    Dominique Fontaine
    Jean-Paul Barthès
    [J]. Knowledge and Information Systems, 2013, 34 : 691 - 724
  • [3] An automatic keyphrase extraction system for scientific documents
    You, Wei
    Fontaine, Dominique
    Barthes, Jean-Paul
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 691 - 724
  • [4] KP-Miner: A keyphrase extraction system for English and Arabic documents
    El-Beltagy, Samhaa R.
    Rafea, Ahmed
    [J]. INFORMATION SYSTEMS, 2009, 34 (01) : 132 - 144
  • [5] Paper Automatic Keyphrase Extractor from Arabic Documents
    Najadat, Hassan M.
    Al-Kabi, Mohammed N.
    Hmeidi, Ismail I.
    Issa, Maysa Mahmoud Bany
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (02) : 192 - 199
  • [6] Automatic Keyphrase Extraction from Medical Documents
    Sarkar, Kamal
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 273 - 278
  • [7] Turkish keyphrase extraction using KEA
    Pala, Nagehan
    Cicekli, Ilyas
    [J]. 2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2007, : 192 - 196
  • [8] Keyphrase-Based Hierarchical Clustering for Arabic Documents
    Hussein, Moufeda
    Alsammak, Abdelwahab
    Elshishtawy, Tarek
    [J]. INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS 2016), 2016, : 61 - 67
  • [9] Automatic Arabic Text Summarization Using Clustering and Keyphrase Extraction
    Fejer, Hamzah Noori
    Omar, Nazlia
    [J]. PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MULTIMEDIA (ICIM), 2014, : 293 - 298
  • [10] Refining Kea plus plus automatic keyphrase assignment
    Irfan, Rabia
    Khan, Sharifullah
    Qamar, Ali Mustafa
    Bloodsworth, Peter Charles
    [J]. JOURNAL OF INFORMATION SCIENCE, 2014, 40 (04) : 446 - 459