A simple but effective method for Indonesian automatic text summarisation

被引:10
|
作者
Lin, Nankai [1 ]
Li, Jinxian [1 ]
Jiang, Shengyi [1 ,2 ]
机构
[1] Guangdong Univ Foreign Studies, Sch Comp Sci & Technol, Guangzhou, Peoples R China
[2] Guangdong Univ Foreign Studies, Guangzhou Key Lab Multilingual Intelligent Proc, Guangzhou, Peoples R China
关键词
Automatic text summarisation; LightGBM; Indonesian; regression;
D O I
10.1080/09540091.2021.1937942
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text summarisation (ATS) (therein two main approaches-abstractive summarisation and extractive summarisation are involved) is an automatic procedure for extracting critical information from the text using a specific algorithm or method. Due to the scarcity of corpus, abstractive summarisation achieves poor performance for low-resource language ATS tasks. That's why it is common for researchers to apply extractive summarisation to low-resource language instead of using abstractive summarisation. As an emerging branch of extraction-based summarisation, methods based on feature analysis quantitate the significance of information by calculating utility scores of each sentence in the article. In this study, we propose a simple but effective extractive method based on the Light Gradient Boosting Machine regression model for Indonesian documents. Four features are extracted, namely PositionScore, TitleScore, the semantic representation similarity between the sentence and the title of document, the semantic representation similarity between the sentence and sentence's cluster center. We define a formula for calculating the sentence score as the objective function of the linear regression. Considering the characteristics of Indonesian, we use Indonesian lemmatisation technology to improve the calculation of sentence score. The results show that our method is more applicable.
引用
收藏
页码:29 / 43
页数:15
相关论文
共 50 条
  • [41] Simple and Effective Text Matching with Richer Alignment Features
    Yang, Runqi
    Zhang, Jianhai
    Gao, Xing
    Ji, Feng
    Chen, Haiqing
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4699 - 4709
  • [42] Effective summarization method of text documents
    Alguliev, RM
    Aliguliyev, RM
    2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 264 - 271
  • [43] Karc1 summarization: A simple and effective approach for automatic text summarization using Karc1 entropy
    Hark, Cengiz
    Karci, Ali
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)
  • [44] A Gradual Combination of Features for Building Automatic Summarisation Systems
    Lloret, Elena
    Palomar, Manuel
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2009, 5729 : 16 - 23
  • [45] Text Summarisation based on Human Language Technologies and its Applications
    Lloret Pastor, Elena
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2012, (48): : 119 - 122
  • [46] A classification-based summarisation model for summarising text documents
    Hannah, M.E. (hanmoses@yahoo.com), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (06): : 3 - 4
  • [47] A New Method of Automatic Text Document Classification
    Yatsko, V. A.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2021, 55 (03) : 122 - 133
  • [48] Effective Use of Augmentation Degree and Language Model for Synonym-based Text Augmentation on Indonesian Text Classification
    Abdurrahman
    Purwarianti, Ayu
    2019 11TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2019), 2019, : 217 - 222
  • [49] A New Method of Automatic Text Document Classification
    V. A. Yatsko
    Automatic Documentation and Mathematical Linguistics, 2021, 55 : 122 - 133
  • [50] Effective Deep Learning Models for Automatic Diacritization of Arabic Text
    Madhfar, Mokthar Ali Hasan
    Qamar, Ali Mustafa
    IEEE ACCESS, 2021, 9 : 273 - 288