Mobility in Unsupervised Word Embeddings for Knowledge Extraction-The Scholars' Trajectories across Research Topics

被引:3
|
作者
Lombardo, Gianfranco [1 ]
Tomaiuolo, Michele [1 ]
Mordonini, Monica [1 ]
Codeluppi, Gaia [1 ]
Poggi, Agostino [1 ]
机构
[1] Univ Parma, Dept Engn & Architecture DIA, I-43100 Parma, Italy
来源
FUTURE INTERNET | 2022年 / 14卷 / 01期
关键词
word embedding; semantic space; knowledge discovery; Word2vec; bert; human mobility; Radius of Gyration; PATTERNS;
D O I
10.3390/fi14010025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the knowledge discovery field of the Big Data domain the analysis of geographic positioning and mobility information plays a key role. At the same time, in the Natural Language Processing (NLP) domain pre-trained models such as BERT and word embedding algorithms such as Word2Vec enabled a rich encoding of words that allows mapping textual data into points of an arbitrary multi-dimensional space, in which the notion of proximity reflects an association among terms or topics. The main contribution of this paper is to show how analytical tools, traditionally adopted to deal with geographic data to measure the mobility of an agent in a time interval, can also be effectively applied to extract knowledge in a semantic realm, such as a semantic space of words and topics, looking for latent trajectories that can benefit the properties of neural network latent representations. As a case study, the Scopus database was queried about works of highly cited researchers in recent years. On this basis, we performed a dynamic analysis, for measuring the Radius of Gyration as an index of the mobility of researchers across scientific topics. The semantic space is built from the automatic analysis of the paper abstracts of each author. In particular, we evaluated two different methodologies to build the semantic space and we found that Word2Vec embeddings perform better than the BERT ones for this task. Finally, The scholars' trajectories show some latent properties of this model, which also represent new scientific contributions of this work. These properties include (i) the correlation between the scientific mobility and the achievement of scientific results, measured through the H-index; (ii) differences in the behavior of researchers working in different countries and subjects; and (iii) some interesting similarities between mobility patterns in this semantic realm and those typically observed in the case of human mobility.
引用
收藏
页数:21
相关论文
共 7 条
  • [1] Unsupervised word embeddings capture latent knowledge from materials science literature
    Tshitoyan, Vahe
    Dagdelen, John
    Weston, Leigh
    Dunn, Alexander
    Rong, Ziqin
    Kononova, Olga
    Persson, Kristin A.
    Ceder, Gerbrand
    Jain, Anubhav
    NATURE, 2019, 571 (7763) : 95 - +
  • [2] Unsupervised word embeddings capture latent knowledge from materials science literature
    Vahe Tshitoyan
    John Dagdelen
    Leigh Weston
    Alexander Dunn
    Ziqin Rong
    Olga Kononova
    Kristin A. Persson
    Gerbrand Ceder
    Anubhav Jain
    Nature, 2019, 571 : 95 - 98
  • [3] Reactions to science communication: discovering social network topics using word embeddings and semantic knowledge
    de Lima, Bernardo Cerqueira
    Baracho, Renata Maria Abrantes
    Mandl, Thomas
    Porto, Patricia Baracho
    SOCIAL NETWORK ANALYSIS AND MINING, 2023, 13 (01)
  • [4] Reactions to science communication: discovering social network topics using word embeddings and semantic knowledge
    Bernardo Cerqueira de Lima
    Renata Maria Abrantes Baracho
    Thomas Mandl
    Patricia Baracho Porto
    Social Network Analysis and Mining, 13
  • [5] Dual CNN for Relation Extraction with Knowledge-Based Attention and Word Embeddings
    Li, Jun
    Huang, Guimin
    Chen, Jianheng
    Wang, Yabing
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2019, 2019
  • [6] Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers
    Patel, Krutarth
    Caragea, Cornelia
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, : 37 - 44
  • [7] Knowledge Trajectories on Public Crisis Management Research from Massive Literature Text Using Topic-Clustered Evolution Extraction
    Wu, Feng
    Xu, Wanqiang
    Lin, Chaoran
    Zhang, Yanwei
    MATHEMATICS, 2022, 10 (12)