Modelling Text Similarity: A Survey

被引:0
|
作者
Mu, Wenchuan [1 ]
Lim, Kwan Hui [1 ]
机构
[1] Singapore Univ Technol & Design, Singapore, Singapore
关键词
Modelling and simulation; Deep learning and embeddings; Algorithms and techniques; SEMANTIC SIMILARITY; KERNELS;
D O I
10.1145/3625007.3627305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online social networking services such as Twitter and Instagram have become pervasive platforms for engaging in discussions on a wide array of topics. These platforms cater to both mainstream subjects, like music and movies, as well as more specialized areas, such as politics. With the growing volume of textual data generated on these platforms, the ability to define and identify similar texts becomes crucial for effective investigation and clustering. In this paper, we explore the challenges and significance of text similarity regression models in the context of online social networking services. We delve into the methods and techniques employed to define and find similarities among texts, enabling the extraction of meaningful patterns and insights. Specifically, we categorize text similarity regression models into four distinct types: set-theoretic, sequence-theoretic, real-vector, and end-to-end methods. This categorization is based on the mathematical formalisation of similarity used by each model. Ultimately, our survey aims to provide a comprehensive overview of the interlinkages between independently proposed methods for text similarity. By understanding the strengths and weaknesses of these methods, researchers can make informed decisions when designing novel approaches and algorithms. We hope this survey serves as a valuable resource for advancing the state-of-the-art in addressing the complex problem of text similarity.
引用
收藏
页码:698 / 705
页数:8
相关论文
共 50 条
  • [21] A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY
    Li, Hao-Di
    Chen, Qing-Cai
    Wang, Xiao-Long
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1869 - 1873
  • [22] Text matching to measure patent similarity
    Arts, Sam
    Cassiman, Bruno
    Carlos Gomez, Juan
    STRATEGIC MANAGEMENT JOURNAL, 2018, 39 (01) : 62 - 84
  • [23] Similarity measures for short segments of text
    Metzler, Donald
    Dumais, Susan
    Meek, Christopher
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 16 - +
  • [24] Quick asymmetric text similarity measures
    Bao, JP
    Shen, JY
    Liu, XD
    Liu, HY
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 374 - 379
  • [25] Similarity searching in the CORDIS text database
    Petrakis, Euripides G. M.
    Tzeras, Kostas
    Software - Practice and Experience, 2000, 30 (13) : 1447 - 1464
  • [26] Gemedoc: A Text Similarity Annotation Platform
    Fize, Jacques
    Roche, Mathieu
    Teisseire, Maguelonne
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 333 - 336
  • [27] Batch Text Similarity Search with MapReduce
    Li, Rui
    Ju, Li
    Peng, Zhuo
    Yu, Zhiwei
    Wang, Chaokun
    WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 412 - +
  • [28] Modeling Text Similarity with Parse Thickets
    Strok, Fedor
    2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 1012 - 1021
  • [29] Benchmarking short text semantic similarity
    O'Shea J.
    Bandar Z.
    Crockett K.
    McLean D.
    International Journal of Intelligent Information and Database Systems, 2010, 4 (02) : 103 - 120
  • [30] An Approach to Semantic Text Similarity Computing
    Akermi, Imen
    Faiz, Rim
    MODERN TRENDS AND TECHNIQUES IN COMPUTER SCIENCE (CSOC 2014), 2014, 285 : 383 - 393