A simple but effective method for Indonesian automatic text summarisation

被引:10
|
作者
Lin, Nankai [1 ]
Li, Jinxian [1 ]
Jiang, Shengyi [1 ,2 ]
机构
[1] Guangdong Univ Foreign Studies, Sch Comp Sci & Technol, Guangzhou, Peoples R China
[2] Guangdong Univ Foreign Studies, Guangzhou Key Lab Multilingual Intelligent Proc, Guangzhou, Peoples R China
关键词
Automatic text summarisation; LightGBM; Indonesian; regression;
D O I
10.1080/09540091.2021.1937942
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text summarisation (ATS) (therein two main approaches-abstractive summarisation and extractive summarisation are involved) is an automatic procedure for extracting critical information from the text using a specific algorithm or method. Due to the scarcity of corpus, abstractive summarisation achieves poor performance for low-resource language ATS tasks. That's why it is common for researchers to apply extractive summarisation to low-resource language instead of using abstractive summarisation. As an emerging branch of extraction-based summarisation, methods based on feature analysis quantitate the significance of information by calculating utility scores of each sentence in the article. In this study, we propose a simple but effective extractive method based on the Light Gradient Boosting Machine regression model for Indonesian documents. Four features are extracted, namely PositionScore, TitleScore, the semantic representation similarity between the sentence and the title of document, the semantic representation similarity between the sentence and sentence's cluster center. We define a formula for calculating the sentence score as the objective function of the linear regression. Considering the characteristics of Indonesian, we use Indonesian lemmatisation technology to improve the calculation of sentence score. The results show that our method is more applicable.
引用
收藏
页码:29 / 43
页数:15
相关论文
共 50 条
  • [1] A method of automatic text summarisation based on long short-term memory
    Fang, Wei
    Jiang, TianXiao
    Jiang, Ke
    Zhang, Feihong
    Ding, Yewen
    Sheng, Jack
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 22 (01) : 39 - 49
  • [2] Automatic annotation of corpora for text summarisation: A comparative study
    Orasan, C
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 670 - 681
  • [3] Wajeez: An Extractive Automatic Arabic Text Summarisation System
    Al Oudah, Abrar
    Al Bassam, Kholoud
    Kurdi, Heba
    Al-Megren, Shiroq
    SOCIAL COMPUTING AND SOCIAL MEDIA: DESIGN, HUMAN BEHAVIOR AND ANALYTICS, SCSM 2019, PT I, 2019, 11578 : 3 - 14
  • [4] Arabic Topic Detection using Automatic Text Summarisation
    Koulali, Rim
    El-Haj, Mahmoud
    Meziane, Abdelouafi
    2013 ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2013,
  • [5] Toward an automatic summarisation of Arabic text depending on rhetorical relations
    Lagrini S.
    Azizi N.
    Redjimi M.
    Al Dwairi M.
    International Journal of Reasoning-based Intelligent Systems, 2019, 11 (03) : 203 - 214
  • [6] Comparison of automatic summarisation methods for clinical free text notes
    Moen, Hans
    Peltonen, Laura-Maria
    Heimonen, Juho
    Airola, Antti
    Pahikkala, Tapio
    Salakoski, Tapio
    Salantera, Sanna
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2016, 67 : 25 - 37
  • [7] Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level
    Cai, Zefeng
    Lin, Nankai
    Ma, Chuyu
    Jiang, Shengyi
    BDE 2019: 2019 INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING, 2019, : 24 - 29
  • [8] Exploring the efficacy and reliability of automatic text summarisation systems: Arabic texts in focus
    Omar, Abdulfattah
    Altohami, Waheed M. A.
    Hamouda, Wafya
    COGENT ARTS & HUMANITIES, 2023, 10 (01):
  • [9] Silhouette plus attraction: A simple and effective method for text clustering
    Errecalde, Marcelo L.
    Cagnina, Leticia C.
    Rosso, Paolo
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (05) : 687 - 726
  • [10] Extractive Text Summarisation in Hindi
    Vijay, Sakshee
    Rai, Vartika
    Gupta, Sorabh
    Vijayvargia, Anshuman
    Sharma, Dipti Misra
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 318 - 321