Study for the Double-array Trie Tree Based Algorithm in Word Segmentation

被引:0
|
作者
Yang, Wenchuan [1 ]
Fang, Zeyang [1 ]
Li, Pengfei [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
关键词
double-array; trie tree; time complexity; word segmentation dictionary;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents an improved algorithm-iDAT, which is based on Double-Array Trie Tree for Chinese Word Segmentation Dictionary. After initialization the original dictionary. Chinese word segmentation dictionary based on the Double-Array Trie Tree has higher efficiency of search, but the dynamic insertion will consume a lot of time. We implement a Hash process to the empty sequence index values for base array. The final Hash table stores the sum of the empty sequence before the current empty sequence. This algorithm adopt Sunday jumps algorithm of Single Pattern Matching. With slightly and reasonable space cost increasing, iDAT reduces the average time complexity of the dynamic insertion process in Trie Tree. Practical results shows it has a good operation performance.
引用
收藏
页码:440 / 446
页数:7
相关论文
共 50 条
  • [1] Research of an Improved Algorithm for Chinese Word Segmentation Dictionary Based on Double-Array Trie Tree
    Yang, Wenchuan
    Liu, Jian
    Yu, Miao
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 355 - 362
  • [2] Trie Compact Representation using Double-array Structures with String Labels
    Kanda, Shunsuke
    Fuketa, Masao
    Morita, Kazuhiro
    Aoe, Jun-ichi
    [J]. 2015 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (IWCIA) PROCEEDINGS, 2015, : 3 - 8
  • [3] Research of Chinese Segmentation Based on MMSeg and Double Array TRIE
    Xu, Lin
    Zhang, Qin
    Wang, Dandong
    Zhang, Jian
    [J]. ADVANCED RESEARCH ON AUTOMATION, COMMUNICATION, ARCHITECTONICS AND MATERIALS, PTS 1 AND 2, 2011, 225-226 (1-2): : 945 - +
  • [4] An efficient key updating algorithm for double-array structure
    Oono, M
    Kadaya, K
    Fuketa, M
    Oda, M
    Harada, J
    Aoe, J
    [J]. 7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XII, PROCEEDINGS: INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS: II, 2003, : 311 - 313
  • [5] Study of an Improved Text Filter Algorithm Based on Trie Tree
    Yang, Wenchuan
    Fang, Zeyang
    Hui, Lei
    [J]. 2016 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C), 2016, : 594 - 597
  • [7] Comparative Study on the Double-Array Structure for Large English & Chinese Lexicons
    Xu, Shuo
    Zhu, Li-Jun
    Qiao, Xiao-Dong
    [J]. ICICTA: 2009 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL IV, PROCEEDINGS, 2009, : 158 - 162
  • [8] An efficient representation for implementing finite state machines based on the double-array
    Mizobuchi, S
    Sumitomo, T
    Fuketa, M
    Aoe, J
    [J]. INFORMATION SCIENCES, 2000, 129 (1-4) : 119 - 139
  • [9] The Adaptive Spelling Error Checking Algorithm based on Trie Tree
    Xu, Yongbing
    Wang, Junyi
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ENERGY, ENVIRONMENT AND CHEMICAL ENGINEERING (AEECE 2016), 2016, 89 : 299 - 302
  • [10] Research on IP classification algorithm based on multibit-trie tree
    Shang, Fengjun
    [J]. DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13 : 748 - 752