Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding

被引:21
|
作者
Liu, Ming [1 ]
Lang, Bo [1 ]
Gu, Zepeng [1 ]
Zeeshan, Ahmed [1 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
关键词
document semantic similarity; text understanding; semantic enrichment; word embedding; scientific literature analysis;
D O I
10.23919/TST.2017.8195345
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.
引用
收藏
页码:619 / 632
页数:14
相关论文
共 50 条
  • [1] Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding
    Ming Liu
    Bo Lang
    Zepeng Gu
    Ahmed Zeeshan
    [J]. Tsinghua Science and Technology, 2017, 22 (06) : 619 - 632
  • [2] Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences
    Nagoudi, El Moatez Billah
    Ferrero, Jeremy
    Schwab, Didier
    Cherroun, Hadda
    [J]. ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 19 - 33
  • [3] Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation
    Pei, Jiahuan
    Zhang, Cong
    Huang, Degen
    Ma, Jianjun
    [J]. NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 766 - 777
  • [4] Measuring text similarity based on structure and word embedding
    Farouk, Mamdouh
    [J]. COGNITIVE SYSTEMS RESEARCH, 2020, 63 : 1 - 10
  • [5] Measuring Patent Similarity with Word Embedding and Statistical Features
    Yu, Yan
    Chen, Lei
    Jiang, Jinde
    Zhao, Naixuan
    [J]. Data Analysis and Knowledge Discovery, 2019, 3 (09): : 53 - 59
  • [6] The Semantic Similarity Relation of Entities Discovery: Using Word Embedding
    Ruan, Dong-ru
    Mao, Yu-xin
    Pan, Hong-yan
    Gao, Kai
    [J]. 2017 9TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL (ICMIC 2017), 2017, : 845 - 850
  • [7] Semantic Similarity of Inverse Morpheme Words Based on Word Embedding
    Zhou, Jiaomei
    Liu, Zhiying
    [J]. CHINESE LEXICAL SEMANTICS, CLSW 2021, PT I, 2022, 13249 : 452 - 463
  • [8] Word Embedding based Textual Semantic Similarity Measure in Bengali
    Iqbal, Md Asif
    Sharif, Omar
    Hoque, Mohammed Moshiul
    Sarker, Iqbal H.
    [J]. 10TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE (YSC2021), 2021, 193 : 92 - 101
  • [9] A survey on word embedding techniques and semantic similarity for paraphrase identification
    Kubal, Divesh R.
    Nimkar, Anant V.
    [J]. International Journal of Computational Systems Engineering, 2019, 5 (01) : 36 - 52
  • [10] AWSS: An Algorithm for Measuring Arabic Word Semantic Similarity
    Almarsoomi, Faaza A.
    O'Shea, James D.
    Bandar, Zuhair
    Crockett, Keeley
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 504 - 509