PatentNet: multi-label classification of patent documents using deep learning based language understanding

被引:0
|
作者
Arousha Haghighian Roudsari
Jafar Afshar
Wookey Lee
Suan Lee
机构
[1] Inha University,Department of Industrial Engineering
[2] Inha University,Department of Biomedical Science and Engineering
[3] Semyung University,School of Computer Science
来源
Scientometrics | 2022年 / 127卷
关键词
Patent classification; Multi-label text classification; Pre-trained language model;
D O I
暂无
中图分类号
学科分类号
摘要
Patent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.
引用
收藏
页码:207 / 231
页数:24
相关论文
共 50 条
  • [1] PatentNet: multi-label classification of patent documents using deep learning based language understanding
    Roudsari, Arousha Haghighian
    Afshar, Jafar
    Lee, Wookey
    Lee, Suan
    [J]. SCIENTOMETRICS, 2022, 127 (01) : 207 - 231
  • [2] Multi-Label Classification of Text Documents Using Deep Learning
    Mohammed, Hamza Haruna
    Dogdu, Erdogan
    Gorur, Abdul Kadir
    Choupani, Roya
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4681 - 4689
  • [3] Multi-label Patent Classification using Attention-Aware Deep Learning Model
    Roudsari, Arousha Haghighian
    Afshar, Jafar
    Lee, Charles Cheolgi
    Lee, Wookey
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 558 - 559
  • [4] Multi-label classification performance using Deep Learning
    Awachat, Snehal
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01): : 119 - 126
  • [5] IPC Multi-label Classification Applying the Characteristics of Patent Documents
    Lim, Sora
    Kwon, YongJin
    [J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 166 - 172
  • [6] Multi-Label Classification of Lung Diseases Using Deep Learning
    Irtaza, Muhammad
    Ali, Arshad
    Gulzar, Maryam
    Wali, Aamir
    [J]. IEEE ACCESS, 2024, 12 : 124062 - 124080
  • [7] MULTI-LABEL CLASSIFICATION OF ICD CODING USING DEEP LEARNING
    Hsu, Chung-Chian
    Chang, Pei-Chi
    Chang, Arthur
    [J]. 2020 INTERNATIONAL SYMPOSIUM ON COMMUNITY-CENTRIC SYSTEMS (CCS), 2020,
  • [8] A Survey of Multi-label Text Classification Based on Deep Learning
    Chen, Xiaolong
    Cheng, Jieren
    Liu, Jingxin
    Xu, Wenghang
    Hua, Shuai
    Tang, Zhu
    Sheng, Victor S.
    [J]. ARTIFICIAL INTELLIGENCE AND SECURITY, ICAIS 2022, PT I, 2022, 13338 : 443 - 456
  • [9] Multi-label Text Classification of German Language Medical Documents
    Spat, Stephan
    Cadonna, Bruno
    Rakovac, Ivo
    Guetl, Christian
    Leitner, Hubert
    Stark, Guenther
    Beck, Peter
    [J]. MEDINFO 2007: PROCEEDINGS OF THE 12TH WORLD CONGRESS ON HEALTH (MEDICAL) INFORMATICS, PTS 1 AND 2: BUILDING SUSTAINABLE HEALTH SYSTEMS, 2007, 129 : 1460 - +
  • [10] Multi-label Garbage Image Classification Based on Deep Learning
    Yan, Kang
    Si, Wenyu
    Hang, Jin
    Zhou, Hong
    Zhu, Quanyin
    [J]. 2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 150 - 153