Word Embedding-Based Biomedical Text Summarization

被引:2
|
作者
Rouane, Oussama [1 ]
Belhadef, Hacene [1 ]
Bouakkaz, Mustapha [2 ]
机构
[1] Univ Constantine 2 Abdelhamid Mehri, Constantine, Algeria
[2] Univ Amar Telidgi, Comp Sci Dept, Fac Sci, Laghouat, Algeria
关键词
Biomedical text summarization; Word embedding; Word2vec; PageRank algorithm; ROUGE metrics;
D O I
10.1007/978-3-030-33582-3_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we have proposed a novel word embedding-based biomedical text summarizer. Biomedical words are represented by real dense vectors. Sentences are represented by summing-up the word vectors that contain. The PageRank algorithm is applied to rank sentences using the cosine similarity as a distance measure between sentences vectors. The top N highly ranked sentences are selected to build the summary. For the evaluation, we created a corpus of 200 biomedical papers downloaded from the Biomed Central full-text database. We used a pre-trained Word2vec model of word vectors generated from a combination of PubMed, PMC, and recent English Wikipedia dump texts. We compared our method with four other summarizers using: ROUGE-1, ROUGE-2, ROUGE-3, and ROUGE-SU4 metrics by evaluating the generated summaries with the abstracts of papers. Our summarizer achieved an improvement of 3.48%, 7.68%, 9.76%, and 3.47% respectively against the second-ranked summarizer.
引用
下载
收藏
页码:288 / 297
页数:10
相关论文
共 50 条
  • [41] A Survey on Automatically-Constructed WordNets and their Evaluation: Lexical and Word Embedding-based Approaches
    Neale, Steven
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1705 - 1710
  • [42] ActRec: A Word Embedding-based Approach to Recommend Movie Actors to Match Role Descriptions
    Lee, Ai-Ni
    Chen, Kuan-Ying
    Li, Cheng-Te
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2020, : 389 - 392
  • [43] Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models
    Rekabsaz, Navid
    Lupu, Mihai
    Baklanov, Artem
    Hanbury, Allan
    Dur, Alexander
    Anderson, Linda
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1712 - 1721
  • [44] An Integrated Word Embedding-Based Dual-Task Learning Method for Sentiment Analysis
    Fu, Yanping
    Liu, Yun
    Peng, Sheng-Lung
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 2571 - 2586
  • [45] Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences
    Nagoudi, El Moatez Billah
    Ferrero, Jeremy
    Schwab, Didier
    Cherroun, Hadda
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 19 - 33
  • [46] Automatic Text Summarization of Biomedical Text Data: A Systematic Review
    Chaves, Andrea
    Kesiku, Cyrille
    Garcia-Zapirain, Begonya
    INFORMATION, 2022, 13 (08)
  • [47] PETGEN: Personalized Text Generation Attack on Deep Sequence Embedding-based Classification Models
    He, Bing
    Ahamad, Mustaque
    Kumar, Srijan
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 575 - 584
  • [48] An embedding-based text classification approach for understanding micro-geographic housing dynamics
    Nilsson, Isabelle
    Delmelle, Elizabeth C.
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2023, 37 (12) : 2487 - 2513
  • [49] Evaluating the Morphological and Capitalization Features for Word Embedding-Based POS Tagger in Bahasa Indonesia
    Manik, Lindung Parningotan
    Syafiandini, Arida Ferti
    Mustika, Hani Febri
    Abka, Achmad Fatchuttamam
    Rianto, Yan
    2018 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2018, : 49 - 53
  • [50] Resolving ambiguity in biomedical text to improve summarization
    Plaza, Laura
    Stevenson, Mark
    Diaz, Alberto
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (04) : 755 - 766