An extractive text summarization approach using tagged-LDA based topic modeling

被引:0
|
作者
Ruby Rani
D. K. Lobiyal
机构
[1] Jawaharlal Nehru University,School of Computer & Systems Sciences
来源
关键词
Topic modeling; Hindi novel; Topic diversity; Retention ratio; Tagged-LDA;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic text summarization is an exertion of contriving the abridged form of a text document covering salient knowledge. Numerous statistical, linguistic, rule-based, and position-based text summarization approaches have been explored for different rich-resourced languages. For under-resourced languages such as Hindi, automatic text summarization is a challenging task and still an unsolved problem. Another issue with such languages is the unavailability of corpus and the inadequacy of the processing tools. In this paper, we proposed an extractive lexical knowledge-rich topic modeling text summarization approach for Hindi novels and stories in which we implemented four independent variants using different sentence weighting schemes. We prepared a corpus of Hindi Novels and stories since the absence of a corpus. We used a smoothing technique for edifying and variety summaries followed by evaluating the efficacy of generated summaries against three metrics (gist diversity, retention ratio, and ROUGE score). The results manifest that the proposed model produces abridge, articulate and coherent summaries. To investigate the performance of the proposed model, we simulate the experiments on the English dataset as well. Further, we compare our models with the baselines and traditional topic modeling approach, where we show that the proposed model has confessed optimal results.
引用
收藏
页码:3275 / 3305
页数:30
相关论文
共 50 条
  • [41] A New Automatic Multi-document Text Summarization using Topic Modeling
    Roul, Rajendra Kumar
    Mehrotra, Samarth
    Pungaliya, Yash
    Sahoo, Jajati Keshari
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2019, 2019, 11319 : 212 - 221
  • [42] A Margin-based Discriminative Modeling Approach for Extractive Speech Summarization
    Liu, Shih-Hung
    Chen, Kuan-Yu
    Chen, Berlin
    Jan, Ea-Ee
    Wang, Hsin-Min
    Yen, Hsu-Chun
    Hsu, Wen-Lian
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [43] Extractive Text Summarization using Word Vector Embedding
    Jain, Aditya
    Bhatia, Divij
    Thakur, Manish K.
    [J]. 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA SCIENCE (MLDS 2017), 2017, : 51 - 55
  • [44] Extractive text summarization using F-RBM
    Sharma, Bharti
    Tomer, Minakshi
    Kriti
    [J]. JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2020, 23 (06) : 1093 - 1104
  • [45] Query-Based Extractive Text Summarization for Sanskrit
    Barve, Siddhi
    Desai, Shaba
    Sardinha, Razia
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2015, 2016, 404 : 559 - 568
  • [46] Attentive Encoder-based Extractive Text Summarization
    Feng, Chong
    Cai, Fei
    Chen, Honghui
    de Rijke, Maarten
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1499 - 1502
  • [47] An approach of syntactical text graph representation learning for extractive summarization
    Tham Vo
    [J]. International Journal of Intelligent Robotics and Applications, 2023, 7 : 190 - 204
  • [48] An approach of syntactical text graph representation learning for extractive summarization
    Tham Vo
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS, 2023, 7 (01) : 190 - 204
  • [49] Graph-based extractive text summarization method for Hausa text
    Bichi, Abdulkadir Abubakar
    Samsudin, Ruhaidah
    Hassan, Rohayanti
    Hasan, Layla Rasheed Abdallah
    Rogo, Abubakar Ado
    [J]. PLOS ONE, 2023, 18 (05):
  • [50] Determinants of Guest Experience in Airbnb: A Topic Modeling Approach Using LDA
    Sutherland, Ian
    Kiatkawsin, Kiattipoom
    [J]. SUSTAINABILITY, 2020, 12 (08)