Patent document clustering with deep embeddings

被引:29
|
作者
Kim, Jaeyoung [1 ]
Yoon, Janghyeok [2 ]
Park, Eunjeong [3 ]
Choi, Sungchul [1 ]
机构
[1] Gachon Univ, Dept Ind Management Engn, TEAMLAB, Seongnam Si, Gyeonggi Do, South Korea
[2] Konkuk Univ, Dept Ind Engn, Seoul, South Korea
[3] NAVER, Seongnam Si, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Information embedding; Patent clustering; Deep learning; Text mining; CITATION NETWORKS;
D O I
10.1007/s11192-020-03396-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The analysis of scientific and technical documents is crucial in the process of establishing science and technology strategies. One popular method for such analysis is for field experts to manually classify each scientific or technical document into one of several predefined technical categories. However, not only is manual classification error-prone and expensive, but it also requires extended efforts to handle frequent data updates. In contrast, machine learning and text mining techniques enable cheaper and faster operations, and can alleviate the burden on human resources. In this paper, we propose a method for extracting embedded feature vectors by applying a neural embedding approach for text features in patent documents and automatically clustering the embedding features by utilizing a deep embedding clustering method.
引用
收藏
页码:563 / 577
页数:15
相关论文
共 50 条
  • [31] Ensemble deep learning of embeddings for clustering multimodal single-cell omics data
    Yu, Lijia
    Liu, Chunlei
    Yang, Jean Yee Hwa
    Yang, Pengyi
    BIOINFORMATICS, 2023, 39 (06)
  • [32] Deep multi-view document clustering with enhanced semantic embedding
    Bai, Ruina
    Huang, Ruizhang
    Chen, Yanping
    Qin, Yongbin
    INFORMATION SCIENCES, 2021, 564 : 273 - 287
  • [33] Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution
    Skoric, Mihailo
    Stankovic, Ranka
    Ikonic Nesic, Milica
    Byszuk, Joanna
    Eder, Maciej
    MATHEMATICS, 2022, 10 (05)
  • [34] From Word Embeddings To Document Distances
    Kusner, Matt J.
    Sun, Yu
    Kolkin, Nicholas I.
    Weinberger, Kilian Q.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 957 - 966
  • [35] Learning Document Embeddings with Crossword Prediction
    Luo, Junyu
    Yang, Min
    Shen, Ying
    Qu, Qiang
    Chai, Haixia
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9993 - 9994
  • [36] Learning Document Embeddings Along With Their Uncertainties
    Kesiraju, Santosh
    Plchot, Oldrich
    Burget, Lukas
    Gangashetty, Suryakanth, V
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2319 - 2332
  • [37] CEDR: Contextualized Embeddings for Document Ranking
    MacAvaney, Sean
    Yates, Andrew
    Cohan, Arman
    Goharian, Nazli
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1101 - 1104
  • [38] Copyright Protection of Computer Software Deep Learning based Patent Text Clustering
    Huang X.
    Wang B.
    Computer-Aided Design and Applications, 2023, 20 (S7): : 120 - 130
  • [39] THE ONTOLOGICAL FUNCTION OF THE PATENT DOCUMENT
    Chin, Andrew
    UNIVERSITY OF PITTSBURGH LAW REVIEW, 2012, 74 (02) : 263 - 332
  • [40] A hierarchical consensus learning model for deep multi-view document clustering
    Bai, Ruina
    Huang, Ruizhang
    Chen, Yanping
    Qin, Yongbin
    Xu, Yong
    Zheng, Qinghua
    INFORMATION FUSION, 2024, 111