SAO2Vec: Development of an algorithm for embedding the subject-action-object (SAO) structure using Doc2Vec

被引:24
|
作者
Kim, Sunhye [1 ]
Park, Inchae [1 ,2 ]
Yoon, Byungun [1 ]
机构
[1] Dongguk Univ, Dept Ind Syst Engn, Coll Engn, Seoul, South Korea
[2] Hansung Univ, Coll IT Engn, Seoul, South Korea
来源
PLOS ONE | 2020年 / 15卷 / 02期
基金
新加坡国家研究基金会;
关键词
PATENT INFORMATION; TECHNOLOGY; IDENTIFICATION; VISUALIZATION; LANGUAGE; NETWORK;
D O I
10.1371/journal.pone.0227930
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In natural-language processing, the subject-action-object (SAO) structure is used to convert unstructured textual data into structured textual data comprising subjects, actions, and objects. This structure is suitable for analyzing the key elements of technology, as well as the relationships between these elements. However, analysis using the existing SAO structure requires a substantial number of manual processes because this structure does not represent the context of the sentences. Thus, we introduce the concept of SAO2Vec, in which SAO is used to embed the vectors of sentences and documents, for use in text mining in the analysis of technical documents. First, the technical documents of interest are collected, and SAO structures are extracted from them. Then, sentence vectors are extracted through the Doc2Vec algorithm and are updated using word vectors in the SAO structure. Finally, SAO vectors are drawn using an updated sentence vector with the same SAO structure. In addition, document vectors are derived from the document's SAO vectors. The results of an experiment in the Internet of things field indicate that the SAO2Vec method produces 3.1% better accuracy than the Doc2Vec method and 115.0% better accuracy than SAO frequency alone. This proves that the proposed SAO2Vec algorithm can be used to improve grouping and similarity analysis by including both the meanings and the contexts of technical elements.
引用
收藏
页数:26
相关论文
共 29 条
  • [1] Semantic Detection of Targeted Attacks Using DOC2VEC Embedding
    El-Rahmany, Mariam S.
    Mohamed, Ensaf Hussein
    Haggag, Mohamed H.
    [J]. JOURNAL OF COMMUNICATIONS SOFTWARE AND SYSTEMS, 2021, 17 (04) : 334 - 341
  • [2] Topic recommendation using Doc2Vec
    Karvelis, Petros
    Gavrilis, Dimitris
    Georgoulas, George
    Stylios, Chrysostomos
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [3] Bug Prediction Using Source Code Embedding Based on Doc2Vec
    Aladics, Tamas
    Jasz, Judit
    Ferenc, Rudolf
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT VII, 2021, 12955 : 382 - 397
  • [4] Bangla news recommendation using doc2vec
    Nandi, Rabindra Nath
    Zaman, M. M. Arefin
    Al Muntasir, Tareq
    Sumit, Sakhawat Hosain
    Sourov, Tanvir
    Rahman, Md. Jamil-Ur
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [5] Chinese abstraction algorithm combining Doc2Vec and TextRank
    Mou, Jinjun
    Xiong, Zhibin
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 149 - 149
  • [6] Poem Generation using Transformers and Doc2Vec Embeddings
    Santillan, Marvin C.
    Azcarraga, Arnulfo P.
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [7] Who is the Ringleader? Modelling Influence in Discourse using Doc2Vec
    Vyas, Priyank
    Smith, Tony
    Feldman, Philip
    Dant, Aaron
    Calude, Andreea
    Patros, Panos
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING AND SELF-ORGANIZING SYSTEMS COMPANION (ACSOS-C 2021), 2021, : 299 - 300
  • [8] Doc2Vec, SBERT, InferSent, and USE Which embedding technique for noun phrases?
    Ajallouda, Lahbib
    Najmani, Kawtar
    Zellou, Ahmed
    Benlahmar, El Habib
    [J]. 2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 548 - 552
  • [9] Micro-blog sentiment classification using Doc2vec
    Liang, Yinghong
    Liu, Haitao
    Zhang, Su
    [J]. JOURNAL OF ENGINEERING-JOE, 2020, 2020 (13): : 407 - 410
  • [10] An Approach to Estimating Cited Sentences in Academic Papers Using Doc2vec
    Tanabe, Shunsuke
    Ohta, Manabu
    Takasu, Atsuhiro
    Adachi, Jun
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES'18), 2018, : 118 - 125