Metadata Based Contextual Summarizer for Technical Conversations in Public Forums

被引:0
|
作者
Ranjan, Gyan [1 ]
Govindan, Abinaya [1 ]
Verma, Amit [1 ]
机构
[1] Neuron7 Ai, Bangalore, Karnataka, India
来源
关键词
natural language processing; abstractive summarization; sequence-to-sequence models; multiple loss optimization; rouge-based learning; information systems;
D O I
10.3233/SSW220019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the task of sequence to sequence based neural abstractive summarization has gained a lot of attention. Many novel strategies have been used to improve the saliency, human readability, and consistency of these models, resulting in high-quality summaries. However, because the majority of these pretrained models were trained on news datasets, they contain an inherent bias. One such bias is that most of these generated summaries originate from the start or end of the text, much like a news story might be summarised. Another issue we encountered while using these summarizers in our Technical discussion forums usecase was token recurrence, which resulted in lower ROUGE-precision scores. To overcome these issues, we present a unique approach that includes: a) An additional parameter to the loss function based on ROUGE-precision score that is optimised alongside categorical cross entropy loss. b) An adaptive loss function based on token repetition rate which is optimized along with the final loss so that the model may provide contextual summaries with less token repetition and successfully learn with the least training samples. c) To effectively contextualize this summarizer for technical forum discussion platforms, we added extra metadata indicator tokens to aid the model in learning latent features and dependencies in text segments with relevant metadata information. To avoid overfitting due to data scarcity, we test and verify all models on a hold-out dataset that was not part of the training or validation dataset. This paper discusses the various strategies we used and compares the performance of fine tuned models against baseline summarizers n the test dataset. By end-to-end training our models with these losses, we acquire substantially better ROUGE scores while being the most legible and relevant summary on the Technical forum dataset.
引用
收藏
页码:170 / 183
页数:14
相关论文
共 45 条
  • [1] Research paper recommender system based on public contextual metadata
    Haruna, Khalid
    Ismail, Maizatul Akmar
    Qazi, Atika
    Kakudi, Habeebah Adamu
    Hassan, Mohammed
    Muaz, Sanah Abdullahi
    Chiroma, Haruna
    [J]. SCIENTOMETRICS, 2020, 125 (01) : 101 - 114
  • [2] Research paper recommender system based on public contextual metadata
    Khalid Haruna
    Maizatul Akmar Ismail
    Atika Qazi
    Habeebah Adamu Kakudi
    Mohammed Hassan
    Sanah Abdullahi Muaz
    Haruna Chiroma
    [J]. Scientometrics, 2020, 125 : 101 - 114
  • [3] A Contextual Query Expansion Based Multi-document Summarizer for Smart Learning
    Yang, Guangbing
    Kinshuk
    Wen, Dunwei
    Sutinen, Erkki
    [J]. 2013 INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS (SITIS), 2013, : 1010 - 1016
  • [4] A Hybrid Personalized Scientific Paper Recommendation Approach Integrating Public Contextual Metadata
    Sakib, Nazmus
    Ahmad, Rodina Binti
    Ahsan, Mominul
    Based, Md Abdul
    Haruna, Khalid
    Haider, Julfikar
    Gurusamy, Saravanakumar
    [J]. IEEE ACCESS, 2021, 9 : 83080 - 83091
  • [5] Addressing Antivaccine Sentiment on Public Social Media Forums Through Web-Based Conversations Based on Motivational Interviewing Techniques: Observational Study
    Scales, David
    Hurth, Lindsay
    Xi, Wenna
    Gorman, Sara
    Radhakrishnan, Malavika
    Windham, Savannah
    Akunne, Azubuike
    Florman, Julia
    Leininger, Lindsey
    Gorman, Jack
    [J]. JMIR INFODEMIOLOGY, 2023, 3 (01):
  • [6] Attention-based contextual local and global features for urgent posts classification in MOOCs discussion forums
    El-Rashidy, Mohamed A.
    Khodeir, Nabila A.
    Farouk, Ahmed
    Aslan, Heba K.
    El-Fishawy, Nawal A.
    [J]. AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (04)
  • [7] Different Contextual Window Sizes Based RNNs for Multimodal Emotion Detection in Interactive Conversations
    Lai, Helang
    Chen, Hongying
    Wu, Shuangyan
    [J]. IEEE ACCESS, 2020, 8 : 119516 - 119526
  • [8] Toward a workable emulation-based preservation strategy: Rationale and technical metadata
    Anderson D.
    Delve J.
    Pinchbeck D.
    [J]. New Review of Information Networking, 2010, 15 (02) : 110 - 131
  • [9] A Metadata Based Agricultural Universal Scientific and Technical Information Fusion and Service Framework
    Cui Yunpeng
    Liu Shihong
    Sun SuFen
    Zhang Junfeng
    Zheng Huaiguo
    [J]. COMPUTER AND COMPUTING TECHNOLOGIES IN AGRICULTURE IV, PT 1, 2011, 344 : 56 - +