Parallel Random Forest with IPython']Python Cluster

被引:0
|
作者
Limprasert, Wasit [1 ]
机构
[1] Thammasat Univ, Dept Comp Sci, Fac Sci & Technol, Pathum Thani, Thailand
关键词
Parallel Algorithm; Random Forest; I[!text type='Python']Python[!/text; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
recently research studies require analytic tools capable to interpret patterns and find hidden knowledge from huge amount of data. Random Forest, an ensemble-tree classifier based on bagging method, is one of many well-known classifiers to find hidden model from data. The classifier has been applied to recognize various kind of data, e.g. human pose from depth images, plankton images and time-series pattern analysis. In this paper, an implementation of optimized parallel Random Forest has been designed and implemented on IPython, which is an interactive Python with parallelization functionalities and convenient to be deployed in most of computing platforms. The implementation shows 80% of CPU utilization when performing a training of 10(7) samples in 12hrs on EC2 cluster with 32 cores. This implementation shows capability to analyses large amount of data.
引用
收藏
页码:62 / 67
页数:6
相关论文
共 50 条
  • [31] Parsl: Pervasive Parallel Programming in Python']Python
    Babuji, Yadu
    Woodard, Anna
    Li, Zhuozhao
    Katz, Daniel S.
    Clifford, Ben
    Kumar, Rohan
    Lacinski, Lukasz
    Chard, Ryan
    Wozniak, Justin M.
    Foster, Ian
    Wilde, Michael
    Chard, Kyle
    HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019, : 25 - 36
  • [32] Parallel distributed computing using Python']Python
    Dalcin, Lisandro D.
    Paz, Rodrigo R.
    Kler, Pablo A.
    Cosimo, Alejandro
    ADVANCES IN WATER RESOURCES, 2011, 34 (09) : 1124 - 1139
  • [33] Performance Analysis of Parallel Python']Python Applications
    Wagner, Michael
    Llort, German
    Mercadal, Estanislao
    Gimenez, Judit
    Labarta, Jesus
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 2171 - 2179
  • [34] Portable Parallel Programs with Python']Python and OpenCL
    Di Pierro, Massimo
    COMPUTING IN SCIENCE & ENGINEERING, 2014, 16 (01) : 34 - 40
  • [35] CharmPy: A Python']Python Parallel Programming Model
    Galvez, Juan J.
    Senthil, Karthik
    Kale, Laxmikant V.
    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 423 - 433
  • [36] A Parallel Python']Python library for nonlinear systems
    Migallon, Hector
    Migallon, Violeta
    Penades, Jose
    JOURNAL OF SUPERCOMPUTING, 2011, 58 (03): : 438 - 448
  • [37] PyCOMPSs: Parallel computational workflows in Python']Python
    Tejedor, Enric
    Becerra, Yolanda
    Alomar, Guillem
    Queralt, Anna
    Badia, Rosa M.
    Torres, Jordi
    Cortes, Toni
    Labarta, Jesus
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2017, 31 (01): : 66 - 82
  • [38] Scalable Parallel Programming in Python']Python with Parsl
    Babuji, Yadu
    Woodard, Anna
    Li, Zhuozhao
    Katz, Daniel S.
    Clifford, Ben
    Foster, Ian
    Wilde, Michael
    Chard, Kyle
    PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
  • [39] PyOMP: Multithreaded Parallel Programming in Python']Python
    Mattson, Timothy G.
    Anderson, Todd A.
    Georgakoudis, Giorgis
    COMPUTING IN SCIENCE & ENGINEERING, 2021, 23 (06) : 77 - 80
  • [40] Configuring in-memory cluster computing using random forest
    Bei, Zhendong
    Yu, Zhibin
    Luo, Ni
    Jiang, Chuntao
    Xu, Chengzhong
    Feng, Shengzhong
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 79 : 1 - 15