Segmentation of Chinese Web text based on Spark

被引:0
|
作者
Xu, Jiazhen [1 ]
机构
[1] Univ Elect Sci & Technol China, Great Wall IoT Joint Lab, Chengdu, Peoples R China
关键词
Spark; Chinese word segmentation; Hadoop;
D O I
10.1109/ISCID.2015.250
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Massive amounts of data generated by network to be analysed and processed on a computer takes plenty of time. It can not meet people's needs. In order to break through the bottleneck of the speed of segmentation, this paper uses the spark cluster, and applies the spark programming ideas to the processing of Chinese word segmentation, so that the Chinese word segmentation technology is implemented in the distributed platform. The research can be based on the guarantee of the accuracy of the original word segmentation and improve the processing speed of Chinese word segmentation significantly, and it is feasible and effective to deal with large amount of Chinese information.
引用
收藏
页码:200 / 203
页数:4
相关论文
共 50 条
  • [1] SEGMENTATION OF CHINESE TEXT FOR WEB CONTENT FILTERING
    Hui, S. C.
    Fong, A. C. M.
    Hong, G. Y.
    2011 INTERNATIONAL CONFERENCE ON MECHANICAL ENGINEERING AND TECHNOLOGY (ICMET 2011), 2011, : 641 - +
  • [2] Segmentation of Chinese Text for Web Content Filtering
    Hui, S. C.
    Fong, A. C. M.
    Hong, G. Y.
    2011 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND AUTOMATION (CCCA 2011), VOL I, 2010, : 50 - 53
  • [3] Chinese web page classification based on text contents
    Liang, JZ
    ISTM/2003: 5TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, CONFERENCE PROCEEDINGS, 2003, : 4733 - 4736
  • [4] Segmentation of Chinese Handwritten Text
    Cao Xinyan
    Zou Yingyong
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 367 - 370
  • [5] Colour text segmentation in web images based on human perception
    Karatzas, D.
    Antonacopoulos, A.
    IMAGE AND VISION COMPUTING, 2007, 25 (05) : 564 - 577
  • [6] A heuristic method based on a statistical approach for Chinese text segmentation
    Yang, CC
    Li, KW
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (13): : 1438 - 1447
  • [7] CHINESE TEXT SEGMENTATION FOR TEXT RETRIEVAL - ACHIEVEMENTS AND PROBLEMS
    WU, ZM
    TSENG, G
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1993, 44 (09): : 532 - 542
  • [8] Classification algorithm for Chinese web text based on manifold learning
    Shi, Shengli
    Fu, Zhibin
    Li, Jinzhao
    Shi, S. (Shengli10@126.com), 2012, Advanced Institute of Convergence Information Technology (06) : 196 - 204
  • [9] Chinese Web Text Classification Model Based on Manifold Learning
    Shi, Shengli
    Fu, Zhibin
    Li, Jinzhao
    INFORMATION COMPUTING AND APPLICATIONS, PT 1, 2012, 307 : 722 - +
  • [10] Text segmentation for Chinese spell checking
    Lee, KH
    Ng, MKM
    Lu, Q
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1999, 50 (09): : 751 - 759