Extractive text summarization using clustering-based topic modeling

被引:0
|
作者
Ramesh Chandra Belwal
Sawan Rai
Atul Gupta
机构
[1] Indian Institute of Information Technology Design and Manufacturing,Department of Computer Science and Engineering
[2] Bennett University,School of Computer Science Engineering and Technology
来源
Soft Computing | 2023年 / 27卷
关键词
Extractive summarization; Topic modeling; Clustering; Semantic measure;
D O I
暂无
中图分类号
学科分类号
摘要
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. Extractive summarizers select a few best sentences out of the input document, while abstractive methods may modify the sentence structure or introduce new sentences. The proposed approach is an extractive text summarization technique, where we have expanded topic modeling specifically to be applied to multiple lower-level specialized entities (i.e., groups) embedded in a single document. Our goal is to overcome the lack of coherence issues found in the summarization techniques. Topic modeling was initially proposed to model text data at the multi-document and word levels without considering sentence modeling. Subsequently, it has been applied at the sentence level and used for the document summarization; however, certain limitations were associated. Topic modeling does not perform as expected when applied to a single document at the sentence level. To address this shortcoming, we have proposed a summarization approach that is incorporated at the individual document and clusters level (instead of the sentence level). We aim to choose the best statement from each group (containing sentences of the same kind) found in the given text. We have tried to select the perfect topic by evaluating the probability distribution of the words and respective topics’ at the cluster level. The method is evaluated on two standard datasets and shows significant performance gains over existing text summarization techniques. Compared to other text summarization techniques, the Rouge parameters for automatic evaluation show a considerable improvement in F-measure, precision, and recall of the generated summary. Furthermore, a manual evaluation has demonstrated that the proposed approach outperforms the current state-of-the-art text summarization approaches.
引用
收藏
页码:3965 / 3982
页数:17
相关论文
共 50 条
  • [1] Extractive text summarization using clustering-based topic modeling
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    [J]. SOFT COMPUTING, 2023, 27 (07) : 3965 - 3982
  • [2] A new graph-based extractive text summarization using keywords or topic modeling
    Ramesh Chandra Belwal
    Sawan Rai
    Atul Gupta
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 8975 - 8990
  • [3] An extractive text summarization approach using tagged-LDA based topic modeling
    Rani, Ruby
    Lobiyal, D. K.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (03) : 3275 - 3305
  • [4] An extractive text summarization approach using tagged-LDA based topic modeling
    Ruby Rani
    D. K. Lobiyal
    [J]. Multimedia Tools and Applications, 2021, 80 : 3275 - 3305
  • [5] A new graph-based extractive text summarization using keywords or topic modeling
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (10) : 8975 - 8990
  • [6] Unsupervised Extractive Text Summarization Using Frequency-Based Sentence Clustering
    Hajjar, Ali
    Tekli, Joe
    [J]. NEW TRENDS IN DATABASE AND INFORMATION SYSTEMS, ADBIS 2022, 2022, 1652 : 245 - 255
  • [7] Topic Modeling Based Text Summarization Approach
    Yu, Shusi
    Wang, Wei
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING APPLICATIONS (CSEA 2015), 2015, : 203 - 207
  • [8] Genetic Clustering Algorithm for Extractive Text Summarization
    Suarez Benjumea, Sebastian
    Leon Guzman, Elizabeth
    [J]. 2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 949 - 956
  • [9] Topic keyword identification for text summarization using lexical clustering
    Ko, Y
    Kim, K
    Seo, J
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (09): : 1695 - 1701
  • [10] Topic modeling combined with classification technique for extractive multi-document text summarization
    Roul, Rajendra Kumar
    [J]. SOFT COMPUTING, 2021, 25 (02) : 1113 - 1127