DeepPatent: patent classification with convolutional neural networks and word embedding

被引:108
|
作者
Li, Shaobo [1 ,2 ]
Hu, Jie [1 ,3 ]
Cui, Yuxin [3 ]
Hu, Jianjun [2 ,3 ]
机构
[1] Guizhou Univ, Minist Educ, Key Lab Adv Mfg Technol, Guiyang 550025, Guizhou, Peoples R China
[2] Guizhou Univ, Sch Mech Engn, Guiyang 550025, Guizhou, Peoples R China
[3] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA
基金
中国国家自然科学基金;
关键词
Patent classification; Text classification; Convolutional neural network; Machine learning; Word embedding; 94-02; Y; TECHNOLOGY; SELECTION; REPRESENTATIONS;
D O I
10.1007/s11192-018-2905-5
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Patent classification is an essential task in patent information management and patent knowledge mining. However, this task is still largely done manually due to the unsatisfactory performance of current algorithms. Recently, deep learning methods such as convolutional neural networks (CNN) have led to great progress in image processing, voice recognition, and speech recognition, which has yet to be applied to patent classification. We proposed DeepPatent, a deep learning algorithm for patent classification based on CNN and word vector embedding. We evaluated the algorithm on the standard patent classification benchmark dataset CLEF-IP and compared it with other algorithms in the CLEF-IP competition. Experiments showed that DeepPatent with automatic feature extraction achieved a classification precision of 83.98%, which outperformed all the existing algorithms that used the same information for training. Its performance is better than the state-of-art patent classifier with a precision of 83.50%, whose performance is, however, based on 4000 characters from the description section and a lot of feature engineering while DeepPatent only used the title and abstract information. DeepPatent is further tested on USPTO-2M, a patent classification benchmark data set that we contributed with 2,000,147 records after data cleaning of 2,679,443 USA raw utility patent documents in 637 categories at the subclass level. Our algorithms achieved a precision of 73.88%.
引用
收藏
页码:721 / 744
页数:24
相关论文
共 50 条
  • [31] Cracking the neural code for word recognition in convolutional neural networks
    Agrawal, Aakash
    Dehaene, Stanislas
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (09)
  • [32] WTL-CNN: a news text classification method of convolutional neural network based on weighted word embedding
    Zhao, Weidong
    Zhu, Lin
    Wang, Ming
    Zhang, Xiliang
    Zhang, Jinming
    CONNECTION SCIENCE, 2022, 34 (01) : 2291 - 2312
  • [33] Embedding Graph Convolutional Networks in Recurrent Neural Networks for Predictive Monitoring
    Rama-Maneiro, Efren
    Vidal, Juan C.
    Lama, Manuel
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (01) : 137 - 151
  • [34] Multilabeled Emotions Classification in Software Engineering Text Using Convolutional Neural Networks and Word Embeddings
    Wagan, Atif Ali
    Li, Shuaiyong
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2025, 37 (03)
  • [35] Nursing-care Text Classification using Word Vector Representation and Convolutional Neural Networks
    Nii, Manabu
    Tsuchida, Yuya
    Kato, Yusuke
    Uchinuno, Atsuko
    Sakashita, Reiko
    2017 JOINT 17TH WORLD CONGRESS OF INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (IFSA-SCIS), 2017,
  • [36] Impact of convolutional neural network and FastText embedding on text classification
    Umer, Muhammad
    Imtiaz, Zainab
    Ahmad, Muhammad
    Nappi, Michele
    Medaglia, Carlo
    Choi, Gyu Sang
    Mehmood, Arif
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (04) : 5569 - 5585
  • [37] Impact of convolutional neural network and FastText embedding on text classification
    Muhammad Umer
    Zainab Imtiaz
    Muhammad Ahmad
    Michele Nappi
    Carlo Medaglia
    Gyu Sang Choi
    Arif Mehmood
    Multimedia Tools and Applications, 2023, 82 : 5569 - 5585
  • [38] Word Difficulty Prediction Using Convolutional Neural Networks
    Basu, Arpan
    Garain, Avishek
    Naskar, Sudip Kumar
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 1109 - 1112
  • [39] ITERATED DILATED CONVOLUTIONAL NEURAL NETWORKS FOR WORD SEGMENTATION
    He, H.
    Yang, X.
    Wu, L.
    Wang, G.
    NEURAL NETWORK WORLD, 2020, 30 (05) : 333 - 346
  • [40] Combining t-Distributed Stochastic Neighbor Embedding With Convolutional Neural Networks for Hyperspectral Image Classification
    Gao, Lianru
    Gu, Daixin
    Zhuang, Lina
    Ren, Jinchang
    Yang, Dong
    Zhang, Bing
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (08) : 1368 - 1372