A simple but effective method for Indonesian automatic text summarisation

被引:10
|
作者
Lin, Nankai [1 ]
Li, Jinxian [1 ]
Jiang, Shengyi [1 ,2 ]
机构
[1] Guangdong Univ Foreign Studies, Sch Comp Sci & Technol, Guangzhou, Peoples R China
[2] Guangdong Univ Foreign Studies, Guangzhou Key Lab Multilingual Intelligent Proc, Guangzhou, Peoples R China
关键词
Automatic text summarisation; LightGBM; Indonesian; regression;
D O I
10.1080/09540091.2021.1937942
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text summarisation (ATS) (therein two main approaches-abstractive summarisation and extractive summarisation are involved) is an automatic procedure for extracting critical information from the text using a specific algorithm or method. Due to the scarcity of corpus, abstractive summarisation achieves poor performance for low-resource language ATS tasks. That's why it is common for researchers to apply extractive summarisation to low-resource language instead of using abstractive summarisation. As an emerging branch of extraction-based summarisation, methods based on feature analysis quantitate the significance of information by calculating utility scores of each sentence in the article. In this study, we propose a simple but effective extractive method based on the Light Gradient Boosting Machine regression model for Indonesian documents. Four features are extracted, namely PositionScore, TitleScore, the semantic representation similarity between the sentence and the title of document, the semantic representation similarity between the sentence and sentence's cluster center. We define a formula for calculating the sentence score as the objective function of the linear regression. Considering the characteristics of Indonesian, we use Indonesian lemmatisation technology to improve the calculation of sentence score. The results show that our method is more applicable.
引用
收藏
页码:29 / 43
页数:15
相关论文
共 50 条
  • [21] An Automatic Labeling Method for Subword-Phrase Recognition in Effective Text Classification
    Kimura Y.
    Komamizu T.
    Hatano K.
    Informatica (Slovenia), 2023, 47 (03): : 315 - 326
  • [22] Automatic summarisation and annotation of microarray data
    Pietro H. Guzzi
    Maria Teresa Di Martino
    Giuseppe Tradigo
    Pierangelo Veltri
    Pierfrancesco Tassone
    Pierosandro Tagliaferri
    Mario Cannataro
    Soft Computing, 2011, 15 : 1505 - 1512
  • [23] Simple Yet Effective Method for Entity Linking in Microblog-Genre Text
    Miao, Qingliang
    Lu, Huayu
    Zhang, Shu
    Meng, Yao
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 440 - 447
  • [24] Automatic summarisation and annotation of microarray data
    Guzzi, Pietro H.
    Di Martino, Maria Teresa
    Tradigo, Giuseppe
    Veltri, Pierangelo
    Tassone, Pierfrancesco
    Tagliaferri, Pierosandro
    Cannataro, Mario
    SOFT COMPUTING, 2011, 15 (08) : 1505 - 1512
  • [25] An initial study on text summarisation in film stories
    Xu, Yan
    Oakes, Michael P.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44): : 35 - 42
  • [26] Readability Evaluation Metrics for Indonesian Automatic Text Summarization: A Systematic Review
    Maylawati, Dian Sa'adillah
    Kumar, Yogan Jaya
    Kasmin, Fauziah Binti
    Ramdhani, Muhammad Ali
    Journal of Engineering Science and Technology Review, 2024, 17 (05) : 199 - 210
  • [27] Biomedical text summarisation using concept chains
    Reeve, Lawrence H.
    Han, Hyoil
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2007, 1 (04) : 389 - 407
  • [28] An effective abstract text summarisation using shark smell optimised bidirectional encoder representations from transformer
    Nafees Muneera M.
    Sriramya P.
    International Journal of Business Intelligence and Data Mining, 2023, 23 (01) : 50 - 72
  • [29] Are extractive text summarisation techniques portable to broadcast news?
    Christensen, H
    Gotoh, Y
    Kolluru, B
    Renals, S
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 489 - 494
  • [30] A conceptual model for text summarisation based on reader requirements
    Hou, Jiang-Liang
    Chen, Yong-Jhih
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2015, 27 (03) : 317 - 323