Analysis and safety engineering of fuzzy string matching algorithms

被引:3
|
作者
Pikies, Malgorzata [1 ]
Ali, Junade [1 ]
机构
[1] Cloudflare, London, England
关键词
String similarity; Fuzzy string matching; Safety engineering; Natural language processing; Binary classification; Neural network;
D O I
10.1016/j.isatra.2020.10.014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we explore fuzzy string matching in an automatic ticket classification and processing system. We compare performance of the following string similarity algorithms: Longest Common Subsequence (LCS), Dice coefficient, Cosine Similarity, Levenshtein (edit) distance and Damerau distance. Through optimisation, we accomplished a 15% improvement in the ratio of false positives to true positive classifications over the existing approach used by a customer support system for free customers. To introduce greater safety; we compliment fuzzy string matching algorithms with a second layer Convolutional Neural Network (CNN) binary classifier, achieving an improved keyword classification ratio for two ticket categories by a relative 69% and 78%. Such an approach allows for classification to only be applied where a desired level of safety achieved, such as in instances where automated answers. (C) 2020 ISA. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [31] Improved algorithms for approximate string matching (extended abstract)
    Papamichail, Dimitris
    Papamichail, Georgios
    BMC BIOINFORMATICS, 2009, 10
  • [32] Alternative algorithms for bit-parallel string matching
    Peltola, H
    Tarhio, J
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2003, 2857 : 80 - 94
  • [33] Algorithms for transposition invariant string matching (extended abstract)
    Mäkinen, V
    Navarro, G
    Ukkonen, E
    STACS 2003, PROCEEDINGS, 2003, 2607 : 191 - 202
  • [34] 2 ALGORITHMS FOR APPROXIMATE STRING MATCHING IN STATIC TEXTS
    JOKINEN, P
    UKKONEN, E
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 520 : 240 - 248
  • [35] SPEEDING UP 2 STRING-MATCHING ALGORITHMS
    CROCHEMORE, M
    LECROQ, T
    CZUMAJ, A
    GASIENIEC, L
    JAROMINEK, S
    PLANDOWSKI, W
    RYTTER, W
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 577 : 589 - 600
  • [36] Detecting false matches in string-matching algorithms
    Muthukrishnan, S
    ALGORITHMICA, 1997, 18 (04) : 512 - 520
  • [37] Efficient string matching algorithms for combinatorial universal denoising
    Chen, S
    Diggavi, S
    Dusad, S
    Muthukrishnan, S
    DCC 2005: Data Compression Conference, Proceedings, 2005, : 153 - 162
  • [38] Performance of Multiple String Matching Algorithms in Text Mining
    Sheshasaayee, Ananthi
    Thailambal, G.
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS, (FICTA 2016), VOL 2, 2017, 516 : 671 - 681
  • [39] Comparison of exact string matching algorithms for biological sequences
    Kalsi, Petri
    Peltola, Hannu
    Tarhio, Jorma
    BIOINFORMATICS RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 13 : 417 - 426
  • [40] Detecting False Matches in String-Matching Algorithms
    S. Muthukrishnan
    Algorithmica, 1997, 18 : 512 - 520