Patent document clustering with deep embeddings

被引:29
|
作者
Kim, Jaeyoung [1 ]
Yoon, Janghyeok [2 ]
Park, Eunjeong [3 ]
Choi, Sungchul [1 ]
机构
[1] Gachon Univ, Dept Ind Management Engn, TEAMLAB, Seongnam Si, Gyeonggi Do, South Korea
[2] Konkuk Univ, Dept Ind Engn, Seoul, South Korea
[3] NAVER, Seongnam Si, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Information embedding; Patent clustering; Deep learning; Text mining; CITATION NETWORKS;
D O I
10.1007/s11192-020-03396-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The analysis of scientific and technical documents is crucial in the process of establishing science and technology strategies. One popular method for such analysis is for field experts to manually classify each scientific or technical document into one of several predefined technical categories. However, not only is manual classification error-prone and expensive, but it also requires extended efforts to handle frequent data updates. In contrast, machine learning and text mining techniques enable cheaper and faster operations, and can alleviate the burden on human resources. In this paper, we propose a method for extracting embedded feature vectors by applying a neural embedding approach for text features in patent documents and automatically clustering the embedding features by utilizing a deep embedding clustering method.
引用
收藏
页码:563 / 577
页数:15
相关论文
共 50 条
  • [1] Patent document clustering with deep embeddings
    Jaeyoung Kim
    Janghyeok Yoon
    Eunjeong Park
    Sungchul Choi
    Scientometrics, 2020, 123 : 563 - 577
  • [2] Deep Clustering of Compressed Variational Embeddings
    Wu, Suya
    Diao, Enmao
    Ding, Jie
    Tarokh, Vahid
    2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 399 - 399
  • [3] Deep audio embeddings for vocalisation clustering
    Best, Paul
    Paris, Sebastien
    Glotin, Herve
    Marxer, Ricard
    PLOS ONE, 2023, 18 (07):
  • [4] Document Clustering Meets Topic Modeling with Word Embeddings
    Costa, Gianni
    Ortale, Riccardo
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 244 - 252
  • [5] JSON']JSON document clustering based on schema embeddings
    Priya, D. Uma
    Thilagam, P. Santhi
    JOURNAL OF INFORMATION SCIENCE, 2024, 50 (05) : 1112 - 1130
  • [6] Patent Document Clustering Using Dimensionality Reduction
    Girthana, K.
    Swamynathan, S.
    PROGRESS IN ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, VOL 2, 2018, 564 : 167 - 176
  • [7] Legal Document Retrieval Using Document Vector Embeddings and Deep Learning
    Sugathadasa, Keet
    Ayesha, Buddhi
    de Silva, Nisansa
    Perera, Amal Shehan
    Jayawardana, Vindula
    Lakmal, Dimuthu
    Perera, Madhavi
    INTELLIGENT COMPUTING, VOL 2, 2019, 857 : 160 - 175
  • [8] DEEP CLUSTERING: DISCRIMINATIVE EMBEDDINGS FOR SEGMENTATION AND SEPARATION
    Hershey, John R.
    Chen, Zhuo
    Le Roux, Jonathan
    Watanabe, Shinji
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 31 - 35
  • [9] Distributed Document and Phrase Co-embeddings for Descriptive Clustering
    Sato, Motoki
    Brockmeier, Austin J.
    Kontonatsios, Georgios
    Mu, Tingting
    Goulermas, John Y.
    Tsujii, Jun'ichi
    Ananiadou, Sophia
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 991 - 1001
  • [10] INCORPORATING PARAGRAPH EMBEDDINGS AND DENSITY PEAKS CLUSTERING FOR SPOKEN DOCUMENT SUMMARIZATION
    Chen, Kuan-Yu
    Shih, Kai-Wun
    Liu, Shih-Hung
    Chen, Berlin
    Wang, Hsin-Min
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 207 - 214