Operator scheduling in data stream systems

被引:57
|
作者
Babcock, B [1 ]
Babu, S [1 ]
Datar, M [1 ]
Motwani, R [1 ]
Thomas, D [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
VLDB JOURNAL | 2004年 / 13卷 / 04期
关键词
data streams; scheduling; memory management; latency;
D O I
10.1007/s00778-004-0132-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In many applications involving continuous data streams, data arrival is bursty and data rate fluctuates over time. Systems that seek to give rapid or real-time query responses in such an environment must be prepared to deal gracefully with bursts in data arrival without compromising system performance. We discuss one strategy for processing bursty streams - adaptive, load-aware scheduling of query operators to minimize resource consumption during times of peak load. We show that the choice of an operator scheduling strategy can have significant impact on the runtime system memory usage as well as output latency. Our aim is to design a scheduling strategy that minimizes the maximum runtime system memory while maintaining the output latency within prespecified bounds. We first present Chain scheduling, an operator scheduling strategy for data stream systems that is near-optimal in minimizing runtime memory usage for any collection of single-stream queries involving selections, projections, and foreign-key joins with stored relations. Chain scheduling also performs well for queries with sliding-window joins over multiple streams and multiple queries of the above types. However, during bursts in input streams, when there is a buildup of unprocessed tuples, Chain scheduling may lead to high output latency. We study the online problem of minimizing maximum runtime memory, subject to a constraint on maximum latency. We present preliminary observations, negative results, and heuristics for this problem. A thorough experimental evaluation is provided where we demonstrate the potential benefits of Chain scheduling and its different variants, compare it with competing scheduling strategies, and validate our analytical conclusions.
引用
收藏
页码:333 / 353
页数:21
相关论文
共 50 条
  • [1] Operator scheduling in data stream systems
    Brian Babcock
    Shivnath Babu
    Mayur Datar
    Rajeev Motwani
    Dilys Thomas
    [J]. The VLDB Journal, 2004, 13 : 333 - 353
  • [2] The Golden Mean Operator Scheduling Strategy in Data Stream Systems
    Deng, Huafeng
    Liu, Yunsheng
    Xiao, Yingyuan
    [J]. ADVANCES IN WEB AND NETWORK TECHNOLOGIES, AND INFORMATION MANAGEMENT, PROCEEDINGS, 2007, 4537 : 186 - 191
  • [3] Data stream operator scheduling algorithm for multiple goals
    [J]. Gao, Y. (gaoyuan012591@yahoo.com.cn), 2005, Huazhong University of Science and Technology (33):
  • [4] Priority-based operator scheduling strategy in data stream system
    Li Maozeng
    Wang Dan
    Du Dongming
    [J]. Advanced Computer Technology, New Education, Proceedings, 2007, : 332 - 337
  • [5] Scheduling Continuous Queries in Data Stream Management Systems
    Sharaf, Mohamed A.
    Labrinidis, Alexandros
    Chrysanthis, Panos K.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1526 - 1527
  • [6] Preemptive Rate-based Operator Scheduling in a Data Stream Management System
    Sharaf, Mohamed A.
    Chrysanthis, Panos K.
    Labrinidis, Alexandros
    [J]. 3RD ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, 2005, 2005,
  • [7] Scheduling Data Stream Jobs on Distributed Systems with Background Load
    Vulpe, Anca
    Frincu, Marc
    [J]. 2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 838 - +
  • [8] Dynamic Tuple Scheduling with Prediction for Data Stream Processing Systems
    Huang, Xi
    Shao, Ziyu
    Yang, Yang
    [J]. 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [9] Real-time scheduling for data stream management systems
    Schmidt, S
    Legler, T
    Schaller, D
    Lehner, W
    [J]. 17TH EUROMICRO CONFERENCE ON REAL-TIME SYSTEMS, PROCEEDINGS, 2005, : 167 - 176
  • [10] An empirical analysis of stateful operator migration for online scheduling in distributed stream processing systems
    Sornalakshmi, K.
    Vadivu, G.
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2023, 98