PatentNet: multi-label classification of patent documents using deep learning based language understanding

被引：0

作者：

Arousha Haghighian Roudsari

Jafar Afshar

Wookey Lee

Suan Lee

机构：

[1] Inha University,Department of Industrial Engineering

[2] Inha University,Department of Biomedical Science and Engineering

[3] Semyung University,School of Computer Science

来源：

Scientometrics | 2022年 / 127卷

关键词：

Patent classification; Multi-label text classification; Pre-trained language model;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Patent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.

引用

页码：207 / 231

页数：24

共 50 条

[1] PatentNet: multi-label classification of patent documents using deep learning based language understanding
Roudsari, Arousha Haghighian
Afshar, Jafar
Lee, Wookey
Lee, Suan
[J]. SCIENTOMETRICS, 2022, 127 (01) : 207 - 231
[2] Multi-Label Classification of Text Documents Using Deep Learning
Mohammed, Hamza Haruna
Dogdu, Erdogan
Gorur, Abdul Kadir
Choupani, Roya
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 4681 - 4689
[3] Multi-label Patent Classification using Attention-Aware Deep Learning Model
Roudsari, Arousha Haghighian
Afshar, Jafar
Lee, Charles Cheolgi
Lee, Wookey
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 558 - 559
[4] Multi-label classification performance using Deep Learning
Awachat, Snehal
[J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01): : 119 - 126
[5] IPC Multi-label Classification Applying the Characteristics of Patent Documents
Lim, Sora
Kwon, YongJin
[J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 166 - 172
[6] Multi-Label Classification of Lung Diseases Using Deep Learning
Irtaza, Muhammad
Ali, Arshad
Gulzar, Maryam
Wali, Aamir
[J]. IEEE ACCESS, 2024, 12 : 124062 - 124080
[7] MULTI-LABEL CLASSIFICATION OF ICD CODING USING DEEP LEARNING
Hsu, Chung-Chian
Chang, Pei-Chi
Chang, Arthur
[J]. 2020 INTERNATIONAL SYMPOSIUM ON COMMUNITY-CENTRIC SYSTEMS (CCS), 2020,
[8] A Survey of Multi-label Text Classification Based on Deep Learning
Chen, Xiaolong
Cheng, Jieren
Liu, Jingxin
Xu, Wenghang
Hua, Shuai
Tang, Zhu
Sheng, Victor S.
[J]. ARTIFICIAL INTELLIGENCE AND SECURITY, ICAIS 2022, PT I, 2022, 13338 : 443 - 456
[9] Multi-label Text Classification of German Language Medical Documents
Spat, Stephan
Cadonna, Bruno
Rakovac, Ivo
Guetl, Christian
Leitner, Hubert
Stark, Guenther
Beck, Peter
[J]. MEDINFO 2007: PROCEEDINGS OF THE 12TH WORLD CONGRESS ON HEALTH (MEDICAL) INFORMATICS, PTS 1 AND 2: BUILDING SUSTAINABLE HEALTH SYSTEMS, 2007, 129 : 1460 - +
[10] Multi-label Garbage Image Classification Based on Deep Learning
Yan, Kang
Si, Wenyu
Hang, Jin
Zhou, Hong
Zhu, Quanyin
[J]. 2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 150 - 153

← 1 2 3 4 5 →