Code Generation in Serializers and Comparators of Apache Flink

被引:2
|
作者
Horvath, Gabor [1 ]
Pataki, Norbert [1 ]
Balassi, Marton [2 ]
机构
[1] Eotvos Lorand Univ, Dept Programming Languages & Compilers, Fac Informat, Budapest, Hungary
[2] Hungarian Acad Sci, Informat Lab, Inst Comp Sci & Control, Budapest, Hungary
关键词
!text type='Java']Java[!/text; Janino; code generation; big data; Flink;
D O I
10.1145/3098572.3098579
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There is a shift in the Big Data world. Applications used to be I/O bound. InfiniBand, SSDs reduced the I/O overhead and more sophisticated algorithms were developed. CPU became a bottleneck for some applications. Using state of the art CPUs, reduced CPU usage can lead to reduced electricity costs even when an application is I/O bound. Apache Flink is an open source framework for processing streams of data and batch jobs. It is using serialization for wide variety of purposes. Not only for sending data over the network, saving it to the hard disk, or for fault tolerance, but also some of the operators can work on the serialized representation of the data instead of Java objects. This approach can improve the performance significantly. Flink has a custom serialization method that enables operators to work on the serialized formats. Currently, Apache Flink uses reflection to serialize Plain Old Java Objects (POJOs). Reflection in Java is notoriously slow. Moreover, the structure of the code is harder to optimize for the JIT compiler. As a Google Summer of Code project in 2016, we implemented code generation for serializers and comparators for POJOs to improve the performance of Apache Flink. Flink has a delicate type system which provides us with lots of information about the types that need to be serialized. Using this information it is possible to generate specialized code with great performance. We achieved more than 6X performance improvement in the serialization which was a 20% overall improvement.
引用
收藏
页数:6
相关论文
共 50 条
  • [11] Mosaics in Big Data Stratosphere, Apache Flink, and Beyond
    Markl, Volker
    [J]. DEBS'18: PROCEEDINGS OF THE 12TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, 2018, : 7 - 13
  • [12] In-Transit Molecular Dynamics Analysis with Apache Flink
    Zanuz, Henrique C.
    Raffin, Bruno
    Mures, Omar A.
    Padron, Emilio J.
    [J]. PROCEEDINGS OF IN SITU INFRASTRUCTURES FOR ENABLING EXTREME-SCALE ANALYSIS AND VISUALIZATION (ISAV 2018), 2018, : 25 - 32
  • [13] FlinkCheck: Property-Based Testing for Apache Flink
    Espinosa, Cristina Valentina
    Martin-Martin, Enrique
    Riesco, Adrian
    Rodriguez-Hortala, Juan
    [J]. IEEE ACCESS, 2019, 7 : 150369 - 150382
  • [14] FogGuru: a Fog Computing platform based on Apache Flink
    Battulga, Davaadorj
    Miorandi, Daniele
    Tedeschi, Cedric
    [J]. 2020 23RD CONFERENCE ON INNOVATION IN CLOUDS, INTERNET AND NETWORKS AND WORKSHOPS (ICIN 2020), 2020, : 156 - 158
  • [15] MotionInsights: Object Tracking in Streaming Video with Apache Flink
    Banelas, Dimitrios
    Petrakis, Euripides G. M.
    [J]. ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, AINA 2024, 2024, 199 : 402 - 414
  • [16] Video2Flink: real-time video partitioning in Apache Flink and the cloud
    Kastrinakis, Dimitrios
    Petrakis, Euripides G. M.
    [J]. MACHINE VISION AND APPLICATIONS, 2023, 34 (03)
  • [17] Video2Flink: real-time video partitioning in Apache Flink and the cloud
    Dimitrios Kastrinakis
    Euripides G. M. Petrakis
    [J]. Machine Vision and Applications, 2023, 34
  • [18] Tink: A Temporal Graph Analytics Library for Apache Flink
    Lightenberg, Wouter
    Pei, Yulong
    Fletcher, George
    Pechenizkiy, Mykola
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 71 - 72
  • [19] Formal Semantics of Apache Flink Complex Event Processing Language
    Fu, Xuan-Deng
    Wu, Zhi-Lin
    [J]. Ruan Jian Xue Bao/Journal of Software, 2024, 35 (10): : 4510 - 4532
  • [20] A Parallel Algorithm for Tracking Dynamic Communities based on Apache Flink
    Kechagias, Georgios
    Tzortzis, Grigorios
    Paliouras, George
    Vogiatzis, Dimitrios
    [J]. 10TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2018), 2018,