An improved fast algorithm of frequent string extracting with no thesaurus

被引:0
|
作者
Zhang, Yumeng [1 ,2 ]
Liu, Chuanhan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
[2] Ningbo Univ, Sch Business, Ningbo 315211, Zhejiang, Peoples R China
来源
MICAI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2007年 / 4827卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unlisted word identification is the hotspot in the research of Chinese information processing. String frequency statistics is a simple and effective method of extraction unlisted word. Existing algorithm cannot meet the requirement of high speed in vast text processing system. According to strategies of string length increasing and level-wise scanning, this paper presents a fast algorithm of extracting frequent strings and improves string frequency statistical method. The approach does not need thesaurus, and does not need to word segmentation, but according to the average mutual information to identify whether each frequent string is a word. Compared with previous approaches, experiments show that the algorithm gains advantages such as high speed, high accuracy of 91% and above.
引用
收藏
页码:894 / +
页数:3
相关论文
共 50 条
  • [21] Fast algorithm for mining maximal frequent itemsets
    Ma, Lisheng
    Deng, Huiwen
    PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON DATA, PRIVACY, AND E-COMMERCE, 2007, : 86 - +
  • [22] A fast Algorithm for mining fuzzy frequent itemsets
    Lin, Jerry Chun-Wei
    Li, Ting
    Fournier-Viger, Philippe
    Hong, Tzung-Pei
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2015, 29 (06) : 2373 - 2379
  • [23] A Fast Parallel Algorithm for Discovering Frequent Patterns
    Lin, Kawuu W.
    Luo, Yu-Chin
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 398 - 403
  • [24] A fast parallel algorithm for frequent itemsets mining
    Souliou, Dora
    Pagourtzis, Aris
    Tsanakas, Panayiotis
    ARTIFICIAL INTELLIGENCE AND INNOVATIONS 2007: FROM THEORY TO APPLICATIONS, 2007, : 213 - +
  • [25] A fast algorithm for mining frequent ordered subtrees
    Hido, Shohei
    Kawano, Hiroyuki
    Systems and Computers in Japan, 2007, 38 (07) : 34 - 43
  • [26] Fast algorithm on string cross pattern matching
    Liu Gongshen
    JournalofSystemsEngineeringandElectronics, 2005, (01) : 179 - 186
  • [27] A fast string search algorithm for computer networking
    Rafiq, ANME
    El-Kharashi, MW
    Gebali, F
    2003 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS, AND SIGNAL PROCESSING, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2003, : 764 - 767
  • [28] Fast algorithm on string cross pattern matching
    Bao, Zheng-Rong
    Wang, Yong-Cheng
    Liu, Gong-Shen
    Han, Ke-Song
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2003, 37 (03): : 420 - 423
  • [29] A FAST ALGORITHM FOR STRING-MATCHING WITH MISMATCHES
    DERMOUCHE, A
    INFORMATION PROCESSING LETTERS, 1995, 55 (02) : 105 - 110
  • [30] A Fast Approximate String Matching Algorithm on GPU
    Nunes, Lucas S. N.
    Bordim, J. L.
    Nakano, K.
    Ito, Y.
    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 188 - 192