Boosting Algorithms for Large-Scale Data and Data Batch Stream

被引:0
|
作者
Yoon, Young Joo [1 ]
机构
[1] Univ Georgia, Dept Stat, 101 Cedar St, Athens, GA 30602 USA
关键词
AdaBoost; Arc-x4; concept drift; data stream; ensemble method; large scale data;
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we propose boosting algorithms when data are very large or coming in batches sequentially over time. In this situation, ordinary boosting algorithm may be inappropriate because it requires the availability of all of the training set at once. To apply to large scale data or data batch stream, we modify the AdaBoost and Arc-x4. These algorithms have good results for both large scale data and data batch stream with or without concept drift on simulated data and real data sets.
引用
收藏
页码:197 / 206
页数:10
相关论文
共 50 条
  • [1] Anomaly detection in large-scale data stream networks
    Duc-Son Pham
    Venkatesh, Svetha
    Lazarescu, Mihai
    Budhaditya, Saha
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (01) : 145 - 189
  • [2] Anomaly detection in large-scale data stream networks
    Duc-Son Pham
    Svetha Venkatesh
    Mihai Lazarescu
    Saha Budhaditya
    [J]. Data Mining and Knowledge Discovery, 2014, 28 : 145 - 189
  • [3] Optimizing data stream processing for large-scale applications
    Cappellari, Paolo
    Roantree, Mark
    Chun, Soon Ae
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09): : 1607 - 1641
  • [4] Scalable Algorithms for Bayesian Inference of Large-Scale Models from Large-Scale Data
    Ghattas, Omar
    Isaac, Tobin
    Petra, Noemi
    Stadler, Georg
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 3 - 6
  • [5] Inverted Index Construction Algorithms For Large-Scale Data
    Wang, He
    Chi, Chengying
    Zhang, Xiumei
    Zhan, Yunyun
    [J]. IAENG International Journal of Computer Science, 2022, 49 (04)
  • [6] Alovera: A Fast Stream Processing System for Large-Scale Data
    Zhang, Zhen'An
    Zhang, Dongjie
    Yu, Xiaopeng
    Wang, Jing
    He, Chunjiang
    Yuan, Pingpeng
    Jin, Hai
    [J]. 2013 8TH CHINAGRID ANNUAL CONFERENCE (CHINAGRID), 2013, : 74 - 79
  • [7] Genetic algorithms for attribute synthesis in large-scale data mining
    Hsu, WH
    Pottenger, WM
    Welge, M
    Wu, J
    Yang, TH
    [J]. GECCO-99: PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 1999, : 1783 - 1783
  • [8] Large-Scale Machine Learning Algorithms for Biomedical Data Science
    Huang, Heng
    [J]. ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 4 - 4
  • [9] Evolving large-scale data stream analytics based on scalable PANFIS
    Za'in, Choiru
    Pratama, Mahardhika
    Pardede, Eric
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 166 : 186 - 197
  • [10] Study on big data center traffic management based on the seperation of large-scale data stream
    Park, Hyoung Woo
    Yeo, Il Yeon
    Lee, Jongsuk Ruth
    Jang, Haengjin
    [J]. 2013 SEVENTH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING (IMIS 2013), 2013, : 591 - 594