Analysis and safety engineering of fuzzy string matching algorithms

被引:3
|
作者
Pikies, Malgorzata [1 ]
Ali, Junade [1 ]
机构
[1] Cloudflare, London, England
关键词
String similarity; Fuzzy string matching; Safety engineering; Natural language processing; Binary classification; Neural network;
D O I
10.1016/j.isatra.2020.10.014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we explore fuzzy string matching in an automatic ticket classification and processing system. We compare performance of the following string similarity algorithms: Longest Common Subsequence (LCS), Dice coefficient, Cosine Similarity, Levenshtein (edit) distance and Damerau distance. Through optimisation, we accomplished a 15% improvement in the ratio of false positives to true positive classifications over the existing approach used by a customer support system for free customers. To introduce greater safety; we compliment fuzzy string matching algorithms with a second layer Convolutional Neural Network (CNN) binary classifier, achieving an improved keyword classification ratio for two ticket categories by a relative 69% and 78%. Such an approach allows for classification to only be applied where a desired level of safety achieved, such as in instances where automated answers. (C) 2020 ISA. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [21] Efficient parallel hardware algorithms for string matching
    Park, JH
    George, KM
    MICROPROCESSORS AND MICROSYSTEMS, 1999, 23 (03) : 155 - 168
  • [22] A Survey of the Hybrid Exact String Matching Algorithms
    Almazroi, Abdulwahab Ali
    Shah, Asad Ali
    Almazroi, Abdulaleem Ali
    Mohammed, Fathey
    Al-Kumaim, Nabil Hasan
    ADVANCES ON INTELLIGENT INFORMATICS AND COMPUTING: HEALTH INFORMATICS, INTELLIGENT SYSTEMS, DATA SCIENCE AND SMART COMPUTING, 2022, 127 : 173 - 189
  • [23] Faster algorithms for string matching with k mismatches
    Amir, A
    Lewenstein, M
    Porat, E
    JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2004, 50 (02): : 257 - 275
  • [24] Efficient algorithms for approximate string matching with swaps
    Kim, DK
    Lee, JS
    Park, K
    Cho, Y
    JOURNAL OF COMPLEXITY, 1999, 15 (01) : 128 - 147
  • [25] Maximum-Shift String Matching Algorithms
    Kadhim, Hakem Adil
    AbdulRashid, NurAini
    2014 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCOINS), 2014,
  • [26] Technology beats algorithms (in exact string matching)
    Tarhio, Jorma
    Holub, Jan
    Giaquinta, Emanuele
    SOFTWARE-PRACTICE & EXPERIENCE, 2017, 47 (12): : 1877 - 1885
  • [27] Faster algorithms for string matching with k mismatches
    Amir, A
    Lewenstein, M
    Porat, E
    PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2000, : 794 - 803
  • [28] Effects of Suffix Repetition Rates of a String on the Performance of String Matching Algorithms
    Wang, Yang
    PROCEEDINGS OF THE 8TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, 2009, : 53 - 58
  • [29] Fuzzy String Matching with a Deep Neural Network
    Shapiro, Daniel
    Japkowicz, Nathalie
    Lemay, Mathieu
    Bolic, Miodrag
    APPLIED ARTIFICIAL INTELLIGENCE, 2018, 32 (01) : 1 - 12
  • [30] Intuitionistic Fuzzy Automaton for Approximate String Matching
    Ravi, K. M.
    Choubey, A.
    Tripati, K. K.
    FUZZY INFORMATION AND ENGINEERING, 2014, 6 (01) : 29 - 39