Segmentation of Chinese Web text based on Spark

被引:0
|
作者
Xu, Jiazhen [1 ]
机构
[1] Univ Elect Sci & Technol China, Great Wall IoT Joint Lab, Chengdu, Peoples R China
关键词
Spark; Chinese word segmentation; Hadoop;
D O I
10.1109/ISCID.2015.250
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Massive amounts of data generated by network to be analysed and processed on a computer takes plenty of time. It can not meet people's needs. In order to break through the bottleneck of the speed of segmentation, this paper uses the spark cluster, and applies the spark programming ideas to the processing of Chinese word segmentation, so that the Chinese word segmentation technology is implemented in the distributed platform. The research can be based on the guarantee of the accuracy of the original word segmentation and improve the processing speed of Chinese word segmentation significantly, and it is feasible and effective to deal with large amount of Chinese information.
引用
收藏
页码:200 / 203
页数:4
相关论文
共 50 条
  • [21] The role of text familiarity in Chinese word segmentation and Chinese vocabulary recognition
    Chen Mingjing
    Wang Yongsheng
    Zhao Bingjie
    Li Xin
    Bai Xuejun
    ACTA PSYCHOLOGICA SINICA, 2022, 54 (10) : 1151 - +
  • [22] Theme Extraction from Chinese Web Documents Based on Page Segmentation and Entropy
    Wang, Deqing
    Zhang, Hui
    Zhou, Gang
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 221 - 230
  • [23] Study of sign segmentation in the text of Chinese sign language
    Dengfeng Yao
    Minghu Jiang
    Yunlong Huang
    Abudoukelimu Abulizi
    Hanjing Li
    Universal Access in the Information Society, 2017, 16 : 725 - 737
  • [24] Research on Enhancing the Effectiveness of the Chinese Text Automatic Categorization Based on ICTCLAS Segmentation Method
    Li, Xiangdong
    Zhang, Cheng
    PROCEEDINGS OF 2013 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2012, : 267 - 270
  • [25] Character Segmentation Method for Irregularly Arranged Text in Chinese
    Yang X.
    Niu X.
    Liang W.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (09): : 1542 - 1548
  • [26] Study of sign segmentation in the text of Chinese sign language
    Yao, Dengfeng
    Jiang, Minghu
    Huang, Yunlong
    Abulizi, Abudoukelimu
    Li, Hanjing
    UNIVERSAL ACCESS IN THE INFORMATION SOCIETY, 2017, 16 (03) : 725 - 737
  • [27] Interpretable Sentiment Analysis and Text Segmentation for Chinese Language
    Hou Zhenghao
    Kolonin, A.
    Optical Memory and Neural Networks (Information Optics), 2024, 33 (Suppl 3): : S483 - S489
  • [28] Weakly Supervised Learning for Over-Segmentation Based Handwritten Chinese Text Recognition
    Wang, Zhen-Xing
    Wang, Qiu-Feng
    Yin, Fei
    Liu, Cheng-Lin
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 157 - 162
  • [29] Lexicon-Based Semi-CRF for Chinese Clinical Text Word Segmentation
    Xia, Guoqing
    Shen, Yao
    Lin, Qiang
    PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC 2017), 2017, : 45 - 50
  • [30] Chinese text classification without automatic word segmentation
    Liu, Wei
    Allison, Ben
    Guthrie, David
    Guthrie, Louise
    ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 45 - +