Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation

被引:0
|
作者
Helmy, Muhammad [1 ]
Basaldella, Marco [1 ]
Maddalena, Eddy [1 ]
Mizzaro, Stefano [1 ]
Demartinit, Gianluca [2 ]
机构
[1] Univ Udine, Udine, Italy
[2] Univ Sheffield, Sheffield, S Yorkshire, England
基金
英国工程与自然科学研究理事会;
关键词
Arabic Language Resources; Dataset; Keyphrase Extraction; Crowdsourcing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Keyphrases are short phrases that best represent a document content. They can be useful in a variety of applications, including document summarization and retrieval models. In this paper, we introduce the first dataset of keyphrases for an Arabic document collection, obtained by means of crowdsourcing. We experimentally evaluate different crowdsourced answer aggregation strategies and validate their performances against expert annotations to evaluate the quality of our dataset. We report about our experimental results, the dataset features, some lessons learned, and ideas for future work.
引用
收藏
页码:26 / 29
页数:4
相关论文
共 50 条
  • [1] Keyphrase Extraction from Modern Standard Arabic Texts Based on Association Rules
    Loukam, Mourad
    Hammouche, Djamila
    Mezzoudj, Freha
    Belkredim, Fatma Zohra
    [J]. ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, ICALP 2019, 2019, 1108 : 209 - 220
  • [2] AKEA: An Arabic Keyphrase Extraction Algorithm
    Amer, Eslam
    Foad, Khaled
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 137 - 146
  • [3] Applying Deep Learning for Arabic Keyphrase Extraction
    Helmy, Muhammad
    Vigneshram, R. M.
    Serra, Giuseppe
    Tasso, Carlo
    [J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 254 - 261
  • [4] Comparison of Naïve Bayes with graph based methods for keyphrase extraction in modern standard Arabic language
    Loukam M.
    [J]. International Journal of Speech Technology, 2023, 26 (1) : 141 - 150
  • [5] Building a Standard Dataset for Arabic Sentiment Analysis: Identifying Potential Annotation Pitfalls
    Al-Kabi, Mohammed N.
    Al-Qwagenah, Areej A.
    Gigieh, Amal H.
    Alsmearat, Kholoud
    Al-Ayyoub, Mahmoud
    Alsmadi, Izzat M.
    [J]. 2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [6] Towards a gold standard dataset for Open Information Extraction in Italian
    Guarasci, Raffaele
    Damiano, Emanuele
    Minutolo, Aniello
    Esposito, Massimo
    [J]. 2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 447 - 453
  • [7] Leveraging Arabic Morphology and Syntax for Achieving Better Keyphrase Extraction
    Helmy, Muhammad
    De Nart, Dario
    Degl'Innocenti, Dante
    Tasso, Carlo
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 340 - 343
  • [8] Automatic Arabic Text Summarization Using Clustering and Keyphrase Extraction
    Fejer, Hamzah Noori
    Omar, Nazlia
    [J]. PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MULTIMEDIA (ICIM), 2014, : 293 - 298
  • [9] TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation
    Bougouin, Adrien
    Barreaux, Sabine
    Romary, Laurent
    Boudin, Florian
    Daille, Beatrice
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1924 - 1927
  • [10] KP-Miner: A keyphrase extraction system for English and Arabic documents
    El-Beltagy, Samhaa R.
    Rafea, Ahmed
    [J]. INFORMATION SYSTEMS, 2009, 34 (01) : 132 - 144