DeepPatent: patent classification with convolutional neural networks and word embedding

被引:108
|
作者
Li, Shaobo [1 ,2 ]
Hu, Jie [1 ,3 ]
Cui, Yuxin [3 ]
Hu, Jianjun [2 ,3 ]
机构
[1] Guizhou Univ, Minist Educ, Key Lab Adv Mfg Technol, Guiyang 550025, Guizhou, Peoples R China
[2] Guizhou Univ, Sch Mech Engn, Guiyang 550025, Guizhou, Peoples R China
[3] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA
基金
中国国家自然科学基金;
关键词
Patent classification; Text classification; Convolutional neural network; Machine learning; Word embedding; 94-02; Y; TECHNOLOGY; SELECTION; REPRESENTATIONS;
D O I
10.1007/s11192-018-2905-5
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Patent classification is an essential task in patent information management and patent knowledge mining. However, this task is still largely done manually due to the unsatisfactory performance of current algorithms. Recently, deep learning methods such as convolutional neural networks (CNN) have led to great progress in image processing, voice recognition, and speech recognition, which has yet to be applied to patent classification. We proposed DeepPatent, a deep learning algorithm for patent classification based on CNN and word vector embedding. We evaluated the algorithm on the standard patent classification benchmark dataset CLEF-IP and compared it with other algorithms in the CLEF-IP competition. Experiments showed that DeepPatent with automatic feature extraction achieved a classification precision of 83.98%, which outperformed all the existing algorithms that used the same information for training. Its performance is better than the state-of-art patent classifier with a precision of 83.50%, whose performance is, however, based on 4000 characters from the description section and a lot of feature engineering while DeepPatent only used the title and abstract information. DeepPatent is further tested on USPTO-2M, a patent classification benchmark data set that we contributed with 2,000,147 records after data cleaning of 2,679,443 USA raw utility patent documents in 637 categories at the subclass level. Our algorithms achieved a precision of 73.88%.
引用
收藏
页码:721 / 744
页数:24
相关论文
共 50 条
  • [1] DeepPatent: patent classification with convolutional neural networks and word embedding
    Shaobo Li
    Jie Hu
    Yuxin Cui
    Jianjun Hu
    Scientometrics, 2018, 117 : 721 - 744
  • [2] Text Classification with Topic-based Word Embedding and Convolutional Neural Networks
    Xu, Haotian
    Dong, Ming
    Zhu, Dongxiao
    Kotov, Alexander
    Carcone, April Idalski
    Naar-King, Sylvie
    PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2016, : 88 - 97
  • [3] Convolutional Neural Network with Contextualized Word Embedding for Text Classification
    Fan, Gaoyang
    Zhu, Cui
    Zhu, Wenjun
    2019 INTERNATIONAL CONFERENCE ON IMAGE AND VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2019, 11321
  • [4] Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya
    Fesseha, Awet
    Xiong, Shengwu
    Emiru, Eshete Derb
    Diallo, Moussa
    Dahou, Abdelghani
    INFORMATION, 2021, 12 (02) : 1 - 17
  • [5] Improving bug localization with word embedding and enhanced convolutional neural networks
    Xiao, Yan
    Keung, Jacky
    Bennin, Kwabena E.
    Mi, Qing
    INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 105 : 17 - 29
  • [6] A Sense Embedding of Deep Convolutional Neural Networks for Sentiment Classification
    Cui, Zhijian
    Shi, Xiaodong
    Chen, Yidong
    Guo, Yinmei
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (11): : 71 - 79
  • [7] Automated Patent Classification Using Word Embedding
    Grawe, Mattyws F.
    Martins, Claudia A.
    Bonfante, Andreia G.
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 408 - 411
  • [8] Speech-Act Classification Using Convolutional Neural Network and Word Embedding
    Bae, Kyoungman
    Ko, Youngjoong
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (06)
  • [9] Deep Topological Embedding with Convolutional Neural Networks for Complex Network Classification
    Scabini, Leonardo
    Ribas, Lucas
    Ribeiro, Eraldo
    Bruno, Odemir
    NETWORK SCIENCE (NETSCI-X 2022), 2022, 13197 : 54 - 66
  • [10] Hierarchical Convolutional Attention Networks Using Joint Chinese Word Embedding for Text Classification
    Zhang, Kaiqiang
    Wang, Shupeng
    Li, Binbin
    Mei, Feng
    Zhang, Jianyu
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 234 - 246