Pasokh: A Standard Corpus for the Evaluation of Persian Text Summarizers

被引:0
|
作者
Moghaddas, Behdad Behmadi [1 ]
Kahani, Mohsen [1 ]
Toosi, Seyyed Ahmad [1 ]
Pourmasoumi, Asef [1 ]
Estiri, Ahmad [1 ]
机构
[1] Ferdowsi Univ Mashhad, Web Technol Lab, Mashhad, Iran
关键词
computational processing of Persian; single-document automatic summarization; multi-document automatic summarization; evaluation of automatic summarization; evaluation corpus;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The increasingly vast amount of information, particularly on the Web, has resulted in a profound need for automatic summarization systems. The systems, in turn, need to be evaluated in terms of how desirably they can retrieve information. The evaluation is done by comparing the machine summaries against a standard reference corpus containing a reasonably large number of text sources and the summaries that human beings have made out of them. Due to the lack of such a standard corpus for Persian, the summarizers that were developed used to be evaluated against the small corpora constructed by the developers of the proposed systems. This made the systems non-comparable. Thus, Pasokh was constructed as a standard large enough reference corpus. It took over 2000 man-hours of work.
引用
收藏
页码:471 / 475
页数:5
相关论文
共 50 条
  • [1] A corpus of Persian literary text
    Raji, Shahab
    Alikhani, Malihe
    de Melo, Gerard
    Stone, Matthew
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (02) : 409 - 425
  • [2] MirasText: An Automatically Generated Text Corpus for Persian
    Sabeti, Behnam
    Firouzjaee, Hossein Abedi
    Choobbasti, Ali Janalizadeh
    Najafabadi, S. H. E. Mortazavi
    Vaheb, Amir
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1174 - 1177
  • [3] Hamshahri: A standard Persian text collection
    AleAhmad, Abolfazl
    Amiri, Hadi
    Darrudi, Ehsan
    Rahgozar, Masoud
    Oroumchian, Farhad
    [J]. KNOWLEDGE-BASED SYSTEMS, 2009, 22 (05) : 382 - 387
  • [4] Development Of A Standard Text And Speech Corpus For The Punjabi Language
    Dhanjal, Surinder
    Bhatia, Satvinder Singh
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [5] Grammatical Non-standard-Phenomena in German and Persian Text Messages
    Bahrami, Kaveh
    Durscheid, Christa
    [J]. DEUTSCHE SPRACHE, 2021, 49 (01): : 46 - 63
  • [6] Annotated Chemical Patent Corpus: A Gold Standard for Text Mining
    Akhondi, Saber A.
    Klenner, Alexander G.
    Tyrchan, Christian
    Manchala, Anil K.
    Boppana, Kiran
    Lowe, Daniel
    Zimmermann, Marc
    Jagarlapudi, Sarma A. R. P.
    Sayle, Roger
    Kors, Jan A.
    Muresan, Sorel
    [J]. PLOS ONE, 2014, 9 (09):
  • [7] To Point or Not to Point: Understanding How Abstractive Summarizers Paraphrase Text
    Wilber, Matt
    Timkey, William
    van Schijndel, Marten
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3362 - 3376
  • [8] Evaluation of statistical part of speech tagging of Persian text
    Tasharofi, Samira
    Raja, Fahimeh
    Oroumchian, Farhad
    Rahgozar, Masoud
    [J]. 2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 152 - 155
  • [9] Evaluation of Persian Text Based on Huffman Data Compression
    Jalilian, Omid
    Haghighat, Abolfazl Toroghi
    Rezvanian, Alireza
    [J]. 2009 XXII INTERNATIONAL SYMPOSIUM ON INFORMATION, COMMUNICATION AND AUTOMATION TECHNOLOGIES, 2009, : 180 - +
  • [10] Text Material Design for Fuzzy Emotional Speech Corpus Based on Persian Semantic and Structure
    Savargiv, Mohammad
    Bastanfard, Azam
    [J]. 2013 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY 2013), 2013, : 380 - 384