A short text modeling method combining semantic and statistical information

被引:59
|
作者
Liu Wenyin [1 ]
Quan, Xiaojun [1 ]
Feng, Min [1 ]
Qiu, Bite [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
关键词
Text similarity; Short text similarity; Information retrieval; Query expansion; Text mining; Question answering; SIMILARITY; EXTRACTION;
D O I
10.1016/j.ins.2010.06.021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel modeling method for a collection of short text snippets is presented in this paper to measure the similarity between pairs of snippets. The method takes account of both the semantic and statistical information within the short text snippets, and consists of three steps. Given a set of raw short text snippets, it first establishes the initial similarity between words by using a lexical database. The method then iteratively calculates both word similarity and short text similarity. Finally, a proximity matrix is constructed based on word similarity and used to convert the raw text snippets into vectors. Word similarity and text clustering experiments show that the proposed short text modeling method improves the performance of existing text-related information retrieval (IR) techniques. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:4031 / 4041
页数:11
相关论文
共 50 条
  • [21] Hybrid method for text summarization based on statistical and semantic treatment
    Nabil Alami
    Mostafa El Mallahi
    Hicham Amakdouf
    Hassan Qjidaa
    Multimedia Tools and Applications, 2021, 80 : 19567 - 19600
  • [22] Knowledge Graph Completion Method of Combining Structural Information with Semantic Information
    Binhao HU
    Jianpeng ZHANG
    Hongchang CHEN
    Chinese Journal of Electronics, 2024, 33 (06) : 1412 - 1420
  • [23] Knowledge Graph Completion Method of Combining Structural Information with Semantic Information
    Hu, Binhao
    Zhang, Jianpeng
    Chen, Hongchang
    CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (06) : 1412 - 1420
  • [24] Dimensionality reduction by combining category information and latent semantic index for text categorization
    Zheng, Wenbin
    An, Lixin
    Xu, Zhanyi
    Journal of Information and Computational Science, 2013, 10 (08): : 2463 - 2469
  • [25] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [26] Short Text Classification Method Combining Word Vector and WTTM
    Ge, Junwei
    Wang, Hanxiao
    Fang, Yiqiu
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1994 - 1997
  • [27] Contemporaneous text as side-information in statistical language modeling
    Khudanpur, S
    Kim, W
    COMPUTER SPEECH AND LANGUAGE, 2004, 18 (02): : 143 - 162
  • [28] Modeling user knowledge and semantic structure for information extraction from text
    Moertl, PM
    ICCM - 2001: PROCEEDINGS OF THE 2001 FOURTH INTERNATIONAL CONFERENCE ON COGNITIVE MODELING, 2001, : 283 - 284
  • [29] EVALUATION AND CLASSIFICATION OF SYNTAX INFORMATION USAGE IN DETERMINING SHORT TEXT SEMANTIC SIMILARITY
    Batanovic, Vuk
    Bojic, Dragan
    2013 21ST TELECOMMUNICATIONS FORUM (TELFOR), 2013, : 821 - 824
  • [30] Short Text Classification Based on Explicit and Implicit Multiscale Weighted Semantic Information
    Gong, Jun
    Zhang, Juling
    Guo, Wenqiang
    Ma, Zhilong
    Lv, Xiaoyi
    SYMMETRY-BASEL, 2023, 15 (11):