Analysis and safety engineering of fuzzy string matching algorithms

被引:3
|
作者
Pikies, Malgorzata [1 ]
Ali, Junade [1 ]
机构
[1] Cloudflare, London, England
关键词
String similarity; Fuzzy string matching; Safety engineering; Natural language processing; Binary classification; Neural network;
D O I
10.1016/j.isatra.2020.10.014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we explore fuzzy string matching in an automatic ticket classification and processing system. We compare performance of the following string similarity algorithms: Longest Common Subsequence (LCS), Dice coefficient, Cosine Similarity, Levenshtein (edit) distance and Damerau distance. Through optimisation, we accomplished a 15% improvement in the ratio of false positives to true positive classifications over the existing approach used by a customer support system for free customers. To introduce greater safety; we compliment fuzzy string matching algorithms with a second layer Convolutional Neural Network (CNN) binary classifier, achieving an improved keyword classification ratio for two ticket categories by a relative 69% and 78%. Such an approach allows for classification to only be applied where a desired level of safety achieved, such as in instances where automated answers. (C) 2020 ISA. Published by Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [41] Comparision of String Matching Algorithms on Spam Email Detection
    Varol, Cihan
    Abdulhadi, Hezha M. Tareq
    2018 INTERNATIONAL CONGRESS ON BIG DATA, DEEP LEARNING AND FIGHTING CYBER TERRORISM (IBIGDELFT), 2018, : 6 - 11
  • [42] THEORETICAL AND EMPIRICAL COMPARISONS OF APPROXIMATE STRING MATCHING ALGORITHMS
    CHANG, WI
    LAMPE, J
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 644 : 175 - 184
  • [43] Improved algorithms for approximate string matching (extended abstract)
    Dimitris Papamichail
    Georgios Papamichail
    BMC Bioinformatics, 10
  • [44] Engineering algorithms for approximate weighted matching
    Maue, Jens
    Sanders, Peter
    EXPERIMENTAL ALGORITHMS, PROCEEDINGS, 2007, 4525 : 242 - +
  • [45] Parallel Corpus Filtering based on Fuzzy String Matching
    Sen, Sukanta
    Ekbal, Asif
    Bhattacharyya, Pushpak
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 289 - 293
  • [46] Stereo matching algorithms based on fuzzy approach
    Kumar, SS
    Chatterji, BN
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2002, 16 (07) : 883 - 899
  • [47] FFT-based algorithms for the string matching with mismatches problem
    Schoenmeyr, T
    Zhang, DY
    JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2005, 57 (02): : 130 - 139
  • [48] Practical algorithms for transposition-invariant string-matching
    Lemstrom, Kjell
    Navarro, Gonzalo
    Pinzon, Yoan
    JOURNAL OF DISCRETE ALGORITHMS, 2005, 3 (2-4) : 267 - 292
  • [49] On the performance of data compression algorithms based upon string matching
    Yang, EH
    Kieffer, JC
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (01) : 47 - 65
  • [50] Bit-parallel approximate string matching algorithms with transposition
    Hyyrö, H
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2003, 2857 : 95 - 107