SAO2Vec: Development of an algorithm for embedding the subject-action-object (SAO) structure using Doc2Vec

被引:24
|
作者
Kim, Sunhye [1 ]
Park, Inchae [1 ,2 ]
Yoon, Byungun [1 ]
机构
[1] Dongguk Univ, Dept Ind Syst Engn, Coll Engn, Seoul, South Korea
[2] Hansung Univ, Coll IT Engn, Seoul, South Korea
来源
PLOS ONE | 2020年 / 15卷 / 02期
基金
新加坡国家研究基金会;
关键词
PATENT INFORMATION; TECHNOLOGY; IDENTIFICATION; VISUALIZATION; LANGUAGE; NETWORK;
D O I
10.1371/journal.pone.0227930
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In natural-language processing, the subject-action-object (SAO) structure is used to convert unstructured textual data into structured textual data comprising subjects, actions, and objects. This structure is suitable for analyzing the key elements of technology, as well as the relationships between these elements. However, analysis using the existing SAO structure requires a substantial number of manual processes because this structure does not represent the context of the sentences. Thus, we introduce the concept of SAO2Vec, in which SAO is used to embed the vectors of sentences and documents, for use in text mining in the analysis of technical documents. First, the technical documents of interest are collected, and SAO structures are extracted from them. Then, sentence vectors are extracted through the Doc2Vec algorithm and are updated using word vectors in the SAO structure. Finally, SAO vectors are drawn using an updated sentence vector with the same SAO structure. In addition, document vectors are derived from the document's SAO vectors. The results of an experiment in the Internet of things field indicate that the SAO2Vec method produces 3.1% better accuracy than the Doc2Vec method and 115.0% better accuracy than SAO frequency alone. This proves that the proposed SAO2Vec algorithm can be used to improve grouping and similarity analysis by including both the meanings and the contexts of technical elements.
引用
收藏
页数:26
相关论文
共 29 条
  • [21] Research on Chinese Audio and Text Alignment Algorithm Based on AIC-FCM and Doc2Vec
    Chen, Keliang
    Huang, Jianming
    Cui, Yansong
    Ren, Weizheng
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [22] Automated Functional Dependency Detection Between Test Cases Using Doc2Vec and Clustering
    Tahvili, Sahar
    Hatvani, Leo
    Felderer, Michael
    Afzal, Wasif
    Bohlin, Markus
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST), 2019, : 19 - 26
  • [23] Retrieval of Semantically Similar Philippine Supreme Court Case Decisions using Doc2Vec
    Barco Ranera, Lorenz Timothy
    Solano, Geoffrey A.
    Oco, Nathaniel
    [J]. 2019 INTERNATIONAL SYMPOSIUM ON MULTIMEDIA AND COMMUNICATION TECHNOLOGY (ISMAC), 2019,
  • [24] Automated Scoring of Interview Videos using Doc2Vec Multimodal Feature Extraction Paradigm
    Chen, Lei
    Feng, Gary
    Leong, Chee Wee
    Lehman, Blair
    Martin-Raugh, Michelle
    Kell, Harrison
    Lee, Chong Min
    Yoon, Su-Youn
    [J]. ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 161 - 168
  • [25] Processing-in-Memory Development Strategy for AI Computing Using Main-Path and Doc2Vec Analyses
    Chung, Euiyoung
    Sohn, So Young
    [J]. SUSTAINABILITY, 2023, 15 (16)
  • [26] A detection method for phishing web page using DOM-based Doc2Vec model
    Feng, Jian
    Zhang, Ying
    Qiao, Yuqiang
    [J]. Journal of Computing and Information Technology, 2020, 28 (01) : 19 - 31
  • [27] COVID-19 Fake News Detection Using Joint Doc2Vec and Text Features with PCA
    Mejia, Hector
    Chipantiza, Carlos
    Llumiquinga, Jose
    Amaro, Isidro R.
    Fonseca-Delgado, Rigoberto
    [J]. ADVANCED RESEARCH IN TECHNOLOGIES, INFORMATION, INNOVATION AND SUSTAINABILITY, ARTIIS 2022, PT I, 2022, 1675 : 316 - 330
  • [28] Problem formulation in inventive design using Doc2vec and Cosine Similarity as Artificial Intelligence methods and Scientific Papers
    Hanifi, Masih
    Chibane, Hicham
    Houssin, Remy
    Cavallucci, Denis
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 109
  • [29] Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec
    Kim, Donghwa
    Seo, Deokseong
    Cho, Suhyoun
    Kang, Pilsung
    [J]. INFORMATION SCIENCES, 2019, 477 : 15 - 29