A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset

被引:11
|
作者
Souza, Cinthia M. [1 ]
Meireles, Magali R. G. [1 ]
Almeida, Paulo E. M. [2 ]
机构
[1] Pontificia Univ Catolica Minas Gerais, Belo Horizonte, MG, Brazil
[2] Fed Ctr Technol Educ Minas Gerais, Belo Horizonte, MG, Brazil
关键词
Computational intelligence; Knowledge representation; Information systems; Automatic text summarization; Patent datasets; LSTM;
D O I
10.1007/s11192-020-03732-x
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Patents are an important source of information for measuring the technological advancement of a specific knowledge domain. To facilitate the search for information in patent datasets, classification systems separate documents into groups according to the area of knowledge, and designate names to define their content. The increase in the number of patented inventions leads to the need to subdivide these groups. Since these groups belong to a restricted knowledge domain, naming the generated subcategories can be extremely laborious. This work aims to compare the performance of abstractive and extractive summarization techniques in the task of generating sentences directly associated with the content of patents. The abstractive summarization model was composed by a Seq2Seq architecture and a LSTM network. The training was conducted with a dataset of patent titles and abstracts. The validation process was performed using the ROUGE set of metrics. The results obtained by the generated model were compared with the sentence resulting from an extractive summarization algorithm applied to the task of naming patent groups. The main idea was to help the specialist to name new patent groups created by the clustering systems. The naming experiments were performed on the dataset of abstracts of patent documents. Comparative experiments were conducted using four subgroups of the United States Patent and Trademark Office, which uses the Cooperative Patent Classification system.
引用
收藏
页码:135 / 156
页数:22
相关论文
共 20 条
  • [1] A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset
    Cinthia M. Souza
    Magali R. G. Meireles
    Paulo E. M. Almeida
    [J]. Scientometrics, 2021, 126 : 135 - 156
  • [2] A Survey of Extractive and Abstractive Automatic Text Summarization Techniques
    Dalal, Vipul
    Malik, Latesh
    [J]. 2013 Sixth International Conference on Emerging Trends in Engineering and Technology (ICETET 2013), 2013, : 109 - 110
  • [3] Comparative Study of Extractive Text Summarization Techniques
    Palliyali, Ahammed Waseem
    Al-Khalifa, Maaz Abdulaziz
    Farooq, Saad
    Abinahed, Julien
    Al-Ansari, Abdulla
    Jaoua, Ali
    [J]. 2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [4] A Study on Abstractive Summarization Techniques in Indian Languages
    Sunitha, C.
    Jaya, A.
    Ganesh, Amal
    [J]. FOURTH INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTER SCIENCE & ENGINEERING (ICRTCSE 2016), 2016, 87 : 25 - 31
  • [5] A Comparative Study of Opinion Summarization Techniques
    Bhatia, Surbhi
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2021, 8 (01): : 110 - 117
  • [6] Using Pre-Trained Language Models for Abstractive DBPEDIA Summarization: A Comparative Study
    Zahera, Hamada M.
    Vitiugin, Fedor
    Sherif, Mohamed Ahmed
    Castillo, Carlos
    Ngomo, Axel-Cyrille Ngonga
    [J]. KNOWLEDGE GRAPHS: SEMANTICS, MACHINE LEARNING, AND LANGUAGES, 2023, 56 : 19 - 37
  • [7] A Comparative Study on Collectives of Term Weighting Methods for Extractive Presentation Speech Summarization
    Zhang, Jian
    Yuan, Huaqiang
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 148 - 151
  • [8] A Comparative Study of the Impact of Statistical and Semantic Features in the Framework of Extractive Text Summarization
    Vodolazova, Tatiana
    Lloret, Elena
    Munoz, Rafael
    Palomar, Manuel
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 306 - 313
  • [9] A Comparative Study on Extractive Speech Summarization of Broadcast News and Parliamentary Meeting Speech
    Zhang, Jian
    Yuan, Huaqiang
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 111 - 114
  • [10] A Comparative Study of Synthetic Dataset Generation Techniques
    Dandekar, Ashish
    Zen, Remmy A. M.
    Bressan, Stephane
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 387 - 395