SAO2Vec: Development of an algorithm for embedding the subject-action-object (SAO) structure using Doc2Vec

被引：24

作者：

Kim, Sunhye ^{[1
]}

Park, Inchae ^{[1
,2
]}

Yoon, Byungun ^{[1
]}

机构：

[1] Dongguk Univ, Dept Ind Syst Engn, Coll Engn, Seoul, South Korea

[2] Hansung Univ, Coll IT Engn, Seoul, South Korea

来源：

PLOS ONE | 2020年 / 15卷 / 02期

基金：

新加坡国家研究基金会;

关键词：

PATENT INFORMATION; TECHNOLOGY; IDENTIFICATION; VISUALIZATION; LANGUAGE; NETWORK;

D O I：

10.1371/journal.pone.0227930

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

In natural-language processing, the subject-action-object (SAO) structure is used to convert unstructured textual data into structured textual data comprising subjects, actions, and objects. This structure is suitable for analyzing the key elements of technology, as well as the relationships between these elements. However, analysis using the existing SAO structure requires a substantial number of manual processes because this structure does not represent the context of the sentences. Thus, we introduce the concept of SAO2Vec, in which SAO is used to embed the vectors of sentences and documents, for use in text mining in the analysis of technical documents. First, the technical documents of interest are collected, and SAO structures are extracted from them. Then, sentence vectors are extracted through the Doc2Vec algorithm and are updated using word vectors in the SAO structure. Finally, SAO vectors are drawn using an updated sentence vector with the same SAO structure. In addition, document vectors are derived from the document's SAO vectors. The results of an experiment in the Internet of things field indicate that the SAO2Vec method produces 3.1% better accuracy than the Doc2Vec method and 115.0% better accuracy than SAO frequency alone. This proves that the proposed SAO2Vec algorithm can be used to improve grouping and similarity analysis by including both the meanings and the contexts of technical elements.

引用

页数：26

共 29 条

[21] Research on Chinese Audio and Text Alignment Algorithm Based on AIC-FCM and Doc2Vec
Chen, Keliang
Huang, Jianming
Cui, Yansong
Ren, Weizheng
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
[22] Automated Functional Dependency Detection Between Test Cases Using Doc2Vec and Clustering
Tahvili, Sahar
Hatvani, Leo
Felderer, Michael
Afzal, Wasif
Bohlin, Markus
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST), 2019, : 19 - 26
[23] Retrieval of Semantically Similar Philippine Supreme Court Case Decisions using Doc2Vec
Barco Ranera, Lorenz Timothy
Solano, Geoffrey A.
Oco, Nathaniel
[J]. 2019 INTERNATIONAL SYMPOSIUM ON MULTIMEDIA AND COMMUNICATION TECHNOLOGY (ISMAC), 2019,
[24] Automated Scoring of Interview Videos using Doc2Vec Multimodal Feature Extraction Paradigm
Chen, Lei
Feng, Gary
Leong, Chee Wee
Lehman, Blair
Martin-Raugh, Michelle
Kell, Harrison
Lee, Chong Min
Yoon, Su-Youn
[J]. ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 161 - 168
[25] Processing-in-Memory Development Strategy for AI Computing Using Main-Path and Doc2Vec Analyses
Chung, Euiyoung
Sohn, So Young
[J]. SUSTAINABILITY, 2023, 15 (16)
[26] A detection method for phishing web page using DOM-based Doc2Vec model
Feng, Jian
Zhang, Ying
Qiao, Yuqiang
[J]. Journal of Computing and Information Technology, 2020, 28 (01) : 19 - 31
[27] COVID-19 Fake News Detection Using Joint Doc2Vec and Text Features with PCA
Mejia, Hector
Chipantiza, Carlos
Llumiquinga, Jose
Amaro, Isidro R.
Fonseca-Delgado, Rigoberto
[J]. ADVANCED RESEARCH IN TECHNOLOGIES, INFORMATION, INNOVATION AND SUSTAINABILITY, ARTIIS 2022, PT I, 2022, 1675 : 316 - 330
[28] Problem formulation in inventive design using Doc2vec and Cosine Similarity as Artificial Intelligence methods and Scientific Papers
Hanifi, Masih
Chibane, Hicham
Houssin, Remy
Cavallucci, Denis
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 109
[29] Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec
Kim, Donghwa
Seo, Deokseong
Cho, Suhyoun
Kang, Pilsung
[J]. INFORMATION SCIENCES, 2019, 477 : 15 - 29

← 1 2 3 →