An improved fast algorithm of frequent string extracting with no thesaurus

被引:0
|
作者
Zhang, Yumeng [1 ,2 ]
Liu, Chuanhan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
[2] Ningbo Univ, Sch Business, Ningbo 315211, Zhejiang, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unlisted word identification is the hotspot in the research of Chinese information processing. String frequency statistics is a simple and effective method of extraction unlisted word. Existing algorithm cannot meet the requirement of high speed in vast text processing system. According to strategies of string length increasing and level-wise scanning, this paper presents a fast algorithm of extracting frequent strings and improves string frequency statistical method. The approach does not need thesaurus, and does not need to word segmentation, but according to the average mutual information to identify whether each frequent string is a word. Compared with previous approaches, experiments show that the algorithm gains advantages such as high speed, high accuracy of 91% and above.
引用
收藏
页码:894 / +
页数:3
相关论文
共 50 条
  • [1] An Improved Algorithm for Extracting Frequent Gradual Patterns
    Kenmogne, Edith Belise
    Tetakouchom, Idriss
    Djamegni, Clementin Tayou
    Nkambou, Roger
    Tabueu, Laurent Cabrel
    [J]. INFORMATICA, 2024, 35 (03) : 577 - 600
  • [2] A novel algorithm for extracting frequent gradual patterns
    Clementin, Tayou Djamegni
    Cabrel, Tabueu Fotso Laurent
    Belise, Kenmogne Edith
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2021, 5
  • [3] SIBA: A Fast Frequent Item Sets Mining Algorithm Based on Sampling and Improved Bat Algorithm
    Wei Ying
    Huang Jian
    Zhang Zhongjie
    Kong Jiangtao
    [J]. 2015 CHINESE AUTOMATION CONGRESS (CAC), 2015, : 64 - 69
  • [4] FAST STRING SEARCHING ALGORITHM
    BOYER, RS
    MOORE, JS
    [J]. COMMUNICATIONS OF THE ACM, 1977, 20 (10) : 762 - 772
  • [5] FAST STRING SEARCHING ALGORITHM
    SORGEN, A
    SONDERGAARD, T
    [J]. COMMUNICATIONS OF THE ACM, 1979, 22 (12) : 679 - 679
  • [6] Fast string matching algorithm
    Al-Howaide, Ala'a
    Mardini, Wail
    Khamayseh, Yaser
    Yasin, Muneer Bani
    [J]. 2010 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (MSE 2010), VOL 2, 2010, : 247 - 250
  • [7] Fast frequent string mining using suffix arrays
    Fischer, J
    Heun, V
    Kramer, S
    [J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 609 - 612
  • [8] A Fast Frequent Subgraph Mining Algorithm
    Wu, Jia
    Chen, Ling
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 82 - 87
  • [9] A fast algorithm for mining frequent patterns
    Ruan, YL
    Zhang, JJ
    Li, QH
    Yang, SD
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1683 - 1686
  • [10] EXPERIMENTS WITH A FAST STRING SEARCHING ALGORITHM
    MOLLERNIELSEN, P
    STAUNSTRUP, J
    [J]. INFORMATION PROCESSING LETTERS, 1984, 18 (03) : 129 - 135