Low Cost Synchronization For Actively Replicated Data Streams

被引:1
|
作者
Martin, Andre [1 ]
Brito, Andrey [2 ]
Fetzer, Christof [1 ]
机构
[1] Tech Univ Dresden, Dresden, Germany
[2] Univ Fed Campina Grande, Campina Grande, Paraiba, Brazil
关键词
D O I
10.1109/ladc48089.2019.8995686
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Active replication is an attractive fault tolerance approach for data stream applications as it provides an almost instantaneous recovery matching well the low latency requirements for real time data analytics. Although the approach offers a quick recovery, it is rarely used in industry as it requires complex mechanisms such as atomic broadcast to ensure correctness, and introduces a non-negligible overhead. The majority of data stream applications compute over event windows utilizing operators such as aggregations or joins which share the property of commutativity where the correctness of the result does not rely on the order of events within such windows. In this paper, we exploit this ordering flexibility by proposing (i) an epoch-based deterministic merge algorithm which provides correctness at a much lower cost than a full-fledged atomic broadcast protocol or deterministic execution to achieve strict ordering. We furthermore propose (ii) a leader-follower protocol as an extension to this approach that lowers the impact on latency caused by stragglers and stops the propagation of any non-determinism originating from source operators. Our evaluation shows that the throughput can be improved by an order of magnitude compared to a strict ordering while providing the same guarantees.
引用
收藏
页码:85 / 94
页数:10
相关论文
共 50 条