An Improved Classification Course Based on Mapreduce

被引:1
|
作者
Wang, Haitao [1 ]
Liu, Shufeng [1 ]
Jia, Zongpu [1 ]
机构
[1] Jilin Univ, Sch Comp Sci & Technol, Changchun 130023, Jilin, Peoples R China
关键词
Classification; Naive Byes; Algorithm; MapReduce; Massive Data;
D O I
10.14257/ijgdc.2015.8.3.05
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
It is an importance step for near-duplication detection to perform file classification in the data mining field, in this paper an improved classification course is proposed which consists of training and test course corresponding to its algorithm respectively. It utilizes the MapReduce computing model created by Google to conduct the classification calculation. Specially, the Sogou news data with various data amounts which simulated the massive data set was used for testing effectiveness and a comparative evaluation on execution time and speedup was accomplished on the experimental circumstance. The results obtained shows that the classification course obviously reduces the execution times greatly and gains the ideal speedup ratio when increasing data amounts, achieves the better performance.
引用
收藏
页码:43 / 52
页数:10
相关论文
共 50 条
  • [1] A MapReduce based approach for classification
    Haldankar, Akash
    Bhowmick, Kiran
    [J]. PROCEEDINGS OF 2016 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2016,
  • [2] MapReduce based for speech classification
    Quang Trung Nguyen
    The Duy Bui
    [J]. PROCEEDINGS OF THE SEVENTH SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2016), 2016, : 87 - 91
  • [4] Improved KNN Text Classification Algorithm with MapReduce Implementation
    Zhao, Yan
    Qian, Yun
    Li, Cuixia
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 1417 - 1422
  • [5] Bigdata clustering and classification with improved fuzzy based deep architecture under MapReduce framework
    Sakthi, Vishnu D.
    Valarmathi, V.
    Surya, V
    Karthikeyan, A.
    Malathi, E.
    [J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2024, 18 (02): : 1511 - 1540
  • [6] Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms
    Sleeman, William C.
    Roseberry, Martha
    Ghosh, Preetam
    Cano, Alberto
    Krawczyk, Bartosz
    [J]. APPLIED INTELLIGENCE, 2024, 54 (23) : 12558 - 12575
  • [7] MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification
    Wei Xu
    Vinh Truong Hoang
    [J]. Mobile Networks and Applications, 2021, 26 : 191 - 199
  • [8] MapReduce-Based Improved Random Forest Model for Massive Educational Data Processing and Classification
    Xu, Wei
    Hoang, Vinh Truong
    [J]. MOBILE NETWORKS & APPLICATIONS, 2021, 26 (01): : 191 - 199
  • [9] A mapreduce based parallel SVM for email classification
    Xu, Ke
    Wen, Cui
    Yuan, Qiong
    He, Xiangzhu
    Tie, Jun
    [J]. Journal of Networks, 2014, 9 (06) : 1640 - 1647
  • [10] Parallel Implementation of Classification Algorithms Based on MapReduce
    He, Qing
    Zhuang, Fuzhen
    Li, Jincheng
    Shi, Zhongzhi
    [J]. ROUGH SET AND KNOWLEDGE TECHNOLOGY (RSKT), 2010, 6401 : 655 - 662