Modelling Text Similarity: A Survey

被引:0
|
作者
Mu, Wenchuan [1 ]
Lim, Kwan Hui [1 ]
机构
[1] Singapore Univ Technol & Design, Singapore, Singapore
关键词
Modelling and simulation; Deep learning and embeddings; Algorithms and techniques; SEMANTIC SIMILARITY; KERNELS;
D O I
10.1145/3625007.3627305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online social networking services such as Twitter and Instagram have become pervasive platforms for engaging in discussions on a wide array of topics. These platforms cater to both mainstream subjects, like music and movies, as well as more specialized areas, such as politics. With the growing volume of textual data generated on these platforms, the ability to define and identify similar texts becomes crucial for effective investigation and clustering. In this paper, we explore the challenges and significance of text similarity regression models in the context of online social networking services. We delve into the methods and techniques employed to define and find similarities among texts, enabling the extraction of meaningful patterns and insights. Specifically, we categorize text similarity regression models into four distinct types: set-theoretic, sequence-theoretic, real-vector, and end-to-end methods. This categorization is based on the mathematical formalisation of similarity used by each model. Ultimately, our survey aims to provide a comprehensive overview of the interlinkages between independently proposed methods for text similarity. By understanding the strengths and weaknesses of these methods, researchers can make informed decisions when designing novel approaches and algorithms. We hope this survey serves as a valuable resource for advancing the state-of-the-art in addressing the complex problem of text similarity.
引用
收藏
页码:698 / 705
页数:8
相关论文
共 50 条
  • [31] Challenges in Chinese text similarity research
    Wang, Xiuhong
    Ju, Shiguang
    Wu, Shengli
    2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 297 - +
  • [32] Continuous Similarity Search for Text Sets
    Tsuchida, Yuma
    Kubo, Kohei
    Koga, Hisashi
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT II, 2022, 13427 : 229 - 234
  • [33] A Similarity Measure for Text Classification and Clustering
    Lin, Yung-Shen
    Jiang, Jung-Yi
    Lee, Shie-Jue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) : 1575 - 1590
  • [34] Semantic Based Text Similarity Computation
    Liu, Yaqi
    Li, Zhijiang
    ADVANCED GRAPHIC COMMUNICATIONS AND MEDIA TECHNOLOGIES, 2017, 417 : 343 - 348
  • [35] Local Similarity Search for Unstructured Text
    Wang, Pei
    Xiao, Chuan
    Qin, Jianbin
    Wang, Wei
    Zhang, Xiaoyang
    Ishikawa, Yoshiharu
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1991 - 2005
  • [36] Modelling similarity perception of intonation
    Reichel, Uwe D.
    Kleber, Felicitas
    Winkelmann, Raphael
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1679 - 1682
  • [37] Modelling asymmetric similarity with prominence
    Johannesson, M
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2000, 53 : 121 - 139
  • [38] Text Similarity Approach for SNOMED CT Primitive Concept Similarity Measure
    Htun, Htet Htet
    Sornlertlamvanich, Virach
    2017 8TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR EMBEDDED SYSTEMS (IC-ICTES), 2017,
  • [39] Text as Policy: Measuring Policy Similarity through Bill Text Reuse
    Linder, Fridolin
    Desmarais, Bruce
    Burgess, Matthew
    Giraudy, Eugenia
    POLICY STUDIES JOURNAL, 2020, 48 (02) : 546 - 574
  • [40] An effective short text conceptualization based on new short text similarity
    Bekkali, Mohammed
    Lachkar, Abdelmonaime
    SOCIAL NETWORK ANALYSIS AND MINING, 2018, 9 (01)