Summary Generation Using Natural Language Processing Techniques and Cosine Similarity

被引:5
|
作者
Pal, Sayantan [1 ]
Chang, Maiga [2 ]
Iriarte, Maria Fernandez [2 ]
机构
[1] Heritage Inst Technol, Kolkata 700107, WB, India
[2] Athabasca Univ, Edmonton, AB T5J 3S8, Canada
关键词
Question and answering; Information extraction; Parts of speech; N-grams; Coronavirus;
D O I
10.1007/978-3-030-96308-8_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The COVID-19 pandemic has led to an unprecedented challenge to public health. It resulted in global efforts to understand, record, and alleviate the disease. This research serves the purpose of generating a relevant summary related to Coronavirus. The research uses the COVID-19 Open Research Dataset (CORD-19) provided by Allen Institute for AI. The dataset contains 236,336 academic full-text articles as of July 19, 2021. This paper introduces a web-based system to handle user questions over the Coronavirus full-text scholarly articles. The system periodically runs backend services to process such large amount article with basic Natural Language Processing (NLP) techniques that include tokenization, N-Grams extraction, and part-of-speech (PoS) tagging. It automatically identifies the keywords from the question and uses cosine similarity to summarize the associated content and present to the user. This research will possibly benefit researchers, health workers as well as other individuals. Moreover, the same service can be used to train with the datasets of different domains (e.g., education) to generate a relevant summary for other user groups (e.g., students).
引用
收藏
页码:508 / 517
页数:10
相关论文
共 50 条
  • [1] Ask4Summary: A Summary Generation Moodle Plugin Using Natural Language Processing Techniques
    Saleh, Mohammed
    Iriarte, Maria F.
    Chang, Maiga
    [J]. 30TH INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION, ICCE 2022, VOL 1, 2022, : 549 - 554
  • [2] Similarity analysis of patent claims using natural language processing techniques
    Indukuri, Kishore Varma
    Ambekar, Anurag Anil
    Sureka, Ashish
    [J]. ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS, 2007, : 169 - 175
  • [3] Research Summary: Intelligent Natural Language Processing Techniques and Tools
    Paolucci, Alessio
    [J]. LOGIC PROGRAMMING, 2009, 5649 : 536 - 537
  • [4] Discovering protein similarity using natural language processing
    Sarkar, IN
    Phil, M
    Rindflesch, TC
    [J]. AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 677 - 681
  • [5] Sentence Similarity Detection in Malayalam Language using cosine similarity
    Gokul, P. P.
    Akhil, B. K.
    Kumar, Shiva K. M.
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 221 - 225
  • [6] Generation of Oracles using Natural Language Processing
    Leong, Iat Tou
    Barbosa, Raul
    [J]. 2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 25 - 31
  • [7] Food Recipe Alternation and Generation with Natural Language Processing Techniques
    Pan, Yuran
    Xu, Qiangwen
    Li, Yanjun
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2020), 2020, : 94 - 97
  • [8] Detecting Semantic Similarity Of Documents Using Natural Language Processing
    Agarwala, Saurabh
    Anagawadi, Aniketh
    Guddeti, Ram Mohana Reddy
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 128 - 135
  • [9] Using Natural Language Processing Techniques and Fuzzy-Semantic Similarity for Automatic External Plagiarism Detection
    Gupta, Deepa
    Vani, K.
    Singh, Charan Kamal
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2694 - 2699
  • [10] Automatic Detection of Similarity of Programs in Karel Programming Language based on Natural Language Processing Techniques
    Sidorov, Grigori
    Ibarra Romero, Martin
    Markov, Ilia
    Guzman-Cabrera, Rafael
    Chanona-Hernandez, Liliana
    Velasquez, Francisco
    [J]. COMPUTACION Y SISTEMAS, 2016, 20 (02): : 279 - 288