A short text modeling method combining semantic and statistical information

被引:59
|
作者
Liu Wenyin [1 ]
Quan, Xiaojun [1 ]
Feng, Min [1 ]
Qiu, Bite [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
关键词
Text similarity; Short text similarity; Information retrieval; Query expansion; Text mining; Question answering; SIMILARITY; EXTRACTION;
D O I
10.1016/j.ins.2010.06.021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel modeling method for a collection of short text snippets is presented in this paper to measure the similarity between pairs of snippets. The method takes account of both the semantic and statistical information within the short text snippets, and consists of three steps. Given a set of raw short text snippets, it first establishes the initial similarity between words by using a lexical database. The method then iteratively calculates both word similarity and short text similarity. Finally, a proximity matrix is constructed based on word similarity and used to convert the raw text snippets into vectors. Word similarity and text clustering experiments show that the proposed short text modeling method improves the performance of existing text-related information retrieval (IR) techniques. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:4031 / 4041
页数:11
相关论文
共 50 条
  • [31] Text Infilling Method based on Key Semantic Information Selection Mechanism
    Zheng, Shuting
    Tian, Wenjing
    Cai, Xiaodong
    2020 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, COMPUTER TECHNOLOGY AND TRANSPORTATION (ISCTT 2020), 2020, : 219 - 223
  • [32] SEMANTIC INFORMATION AND STATISTICAL INFERENCE
    MENGES, G
    BIOMETRISCHE ZEITSCHRIFT, 1972, 14 (06): : 409 - 418
  • [33] Benchmarking short text semantic similarity
    O'Shea J.
    Bandar Z.
    Crockett K.
    McLean D.
    International Journal of Intelligent Information and Database Systems, 2010, 4 (02) : 103 - 120
  • [34] Semantic Enriched Short Text Clustering
    Kozlowski, Marek
    Rybinski, Henryk
    FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 435 - 445
  • [35] Text Clustering Using Statistical and Semantic Data
    Benghabrit, Asmaa
    Ouhbi, Brahim
    Behja, Hicham
    Frikh, Bouchra
    WORLD CONGRESS ON COMPUTER & INFORMATION TECHNOLOGY (WCCIT 2013), 2013,
  • [36] A Short Text Classification Method Based on Convolutional Neural Network and Semantic Extension
    Wang, Haitao
    Tian, Keke
    Wu, Zhengjiang
    Wang, Lei
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 367 - 375
  • [37] A Semantic-based Method of Internet Public Opinion Analysis for Short Text
    Hou, Shengluan
    Liu, Lei
    Cao, Cungen
    Yan, Shuying
    INTERNATIONAL SYMPOSIUM ON FUZZY SYSTEMS, KNOWLEDGE DISCOVERY AND NATURAL COMPUTATION (FSKDNC 2014), 2014, : 335 - 339
  • [38] Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space
    Pan, Liqiang
    Zhang, Pu
    Xiong, Anping
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (02) : 306 - 310
  • [39] LDA-PSTR: A Topic Modeling Method for Short Text
    Zhou, Kai
    Yang, Qun
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 339 - 352
  • [40] A semi-explicit short text retrieval method combining Wikipedia features
    Li, Pu
    Li, Tianci
    Zhang, Suzhi
    Li, Yuhua
    Tang, Yong
    Jiang, Yuncheng
    Engineering Applications of Artificial Intelligence, 2020, 94