DeepPatent: patent classification with convolutional neural networks and word embedding

被引:108
|
作者
Li, Shaobo [1 ,2 ]
Hu, Jie [1 ,3 ]
Cui, Yuxin [3 ]
Hu, Jianjun [2 ,3 ]
机构
[1] Guizhou Univ, Minist Educ, Key Lab Adv Mfg Technol, Guiyang 550025, Guizhou, Peoples R China
[2] Guizhou Univ, Sch Mech Engn, Guiyang 550025, Guizhou, Peoples R China
[3] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA
基金
中国国家自然科学基金;
关键词
Patent classification; Text classification; Convolutional neural network; Machine learning; Word embedding; 94-02; Y; TECHNOLOGY; SELECTION; REPRESENTATIONS;
D O I
10.1007/s11192-018-2905-5
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Patent classification is an essential task in patent information management and patent knowledge mining. However, this task is still largely done manually due to the unsatisfactory performance of current algorithms. Recently, deep learning methods such as convolutional neural networks (CNN) have led to great progress in image processing, voice recognition, and speech recognition, which has yet to be applied to patent classification. We proposed DeepPatent, a deep learning algorithm for patent classification based on CNN and word vector embedding. We evaluated the algorithm on the standard patent classification benchmark dataset CLEF-IP and compared it with other algorithms in the CLEF-IP competition. Experiments showed that DeepPatent with automatic feature extraction achieved a classification precision of 83.98%, which outperformed all the existing algorithms that used the same information for training. Its performance is better than the state-of-art patent classifier with a precision of 83.50%, whose performance is, however, based on 4000 characters from the description section and a lot of feature engineering while DeepPatent only used the title and abstract information. DeepPatent is further tested on USPTO-2M, a patent classification benchmark data set that we contributed with 2,000,147 records after data cleaning of 2,679,443 USA raw utility patent documents in 637 categories at the subclass level. Our algorithms achieved a precision of 73.88%.
引用
收藏
页码:721 / 744
页数:24
相关论文
共 50 条
  • [21] Text sentiment classification of Amazon reviews using word embeddings and convolutional neural networks
    Mohammed Qorich
    Rajae El Ouazzani
    The Journal of Supercomputing, 2023, 79 : 11029 - 11054
  • [22] Convolutional Neural Networks for event classification
    Rubio Jimenez, Adrian
    Garcia Navarro, Jose Enrique
    Moreno Llacer, Maria
    NINTH ANNUAL CONFERENCE ON LARGE HADRON COLLIDER PHYSICS, LHCP2021, 2021,
  • [23] Convolutional Neural Networks for image classification
    Jmour, Nadia
    Zayen, Sehla
    Abdelkrim, Afef
    2018 INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND ELECTRICAL TECHNOLOGIES (IC_ASET), 2017, : 397 - 402
  • [24] Flower Classification with Convolutional Neural Networks
    Mitrovic, Katarina
    Milosevic, Danijela
    2019 23RD INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2019, : 845 - 850
  • [25] Convolutional Neural Networks for Electrocardiogram Classification
    Mohamad M. Al Rahhal
    Yakoub Bazi
    Mansour Al Zuair
    Esam Othman
    Bilel BenJdira
    Journal of Medical and Biological Engineering, 2018, 38 : 1014 - 1025
  • [26] Glomerulus Classification with Convolutional Neural Networks
    Pedraza, Anibal
    Gallego, Jaime
    Lopez, Samuel
    Gonzalez, Lucia
    Laurinavicius, Arvydas
    Bueno, Gloria
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS (MIUA 2017), 2017, 723 : 839 - 849
  • [27] Convolutional Neural Networks for Electrocardiogram Classification
    Al Rahhal, Mohamad M.
    Bazi, Yakoub
    Al Zuair, Mansour
    Othman, Esam
    BenJdira, Bilel
    JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2018, 38 (06) : 1014 - 1025
  • [28] Convolutional Neural Networks for ATC Classification
    Lumini, Alessandra
    Nanni, Loris
    CURRENT PHARMACEUTICAL DESIGN, 2018, 24 (34) : 4007 - 4012
  • [29] Convolutional Neural Networks for Font Classification
    Tensmeyer, Chris
    Saunders, Daniel
    Martinez, Tony
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 985 - 990
  • [30] Classification of Phonocardiograms with Convolutional Neural Networks
    Deperlioglu, Omer
    BRAIN-BROAD RESEARCH IN ARTIFICIAL INTELLIGENCE AND NEUROSCIENCE, 2018, 9 (02): : 22 - 33