Code Generation in Serializers and Comparators of Apache Flink

被引:2
|
作者
Horvath, Gabor [1 ]
Pataki, Norbert [1 ]
Balassi, Marton [2 ]
机构
[1] Eotvos Lorand Univ, Dept Programming Languages & Compilers, Fac Informat, Budapest, Hungary
[2] Hungarian Acad Sci, Informat Lab, Inst Comp Sci & Control, Budapest, Hungary
关键词
!text type='Java']Java[!/text; Janino; code generation; big data; Flink;
D O I
10.1145/3098572.3098579
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There is a shift in the Big Data world. Applications used to be I/O bound. InfiniBand, SSDs reduced the I/O overhead and more sophisticated algorithms were developed. CPU became a bottleneck for some applications. Using state of the art CPUs, reduced CPU usage can lead to reduced electricity costs even when an application is I/O bound. Apache Flink is an open source framework for processing streams of data and batch jobs. It is using serialization for wide variety of purposes. Not only for sending data over the network, saving it to the hard disk, or for fault tolerance, but also some of the operators can work on the serialized representation of the data instead of Java objects. This approach can improve the performance significantly. Flink has a custom serialization method that enables operators to work on the serialized formats. Currently, Apache Flink uses reflection to serialize Plain Old Java Objects (POJOs). Reflection in Java is notoriously slow. Moreover, the structure of the code is harder to optimize for the JIT compiler. As a Google Summer of Code project in 2016, we implemented code generation for serializers and comparators for POJOs to improve the performance of Apache Flink. Flink has a delicate type system which provides us with lots of information about the types that need to be serialized. Using this information it is possible to generate specialized code with great performance. We achieved more than 6X performance improvement in the serialization which was a 20% overall improvement.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] An efficient architecture for processing real-time traffic data streams using apache flink
    B. Gnana Deepthi
    K. Sandhya Rani
    P. Venkata Krishna
    V. Saritha
    [J]. Multimedia Tools and Applications, 2024, 83 : 37369 - 37385
  • [42] An Empirical Analysis of Code Clone Authorship in Apache Projects
    Yokomori, Reishi
    Inoue, Katsuro
    [J]. 2023 IEEE 17TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES, IWSC 2023, 2023, : 1 - 7
  • [43] Real-Time Deep Learning-Based Anomaly Detection Approach for Multivariate Data Streams with Apache Flink
    Ha, Tae Wook
    Kang, Jung Mo
    Kim, Myoung Ho
    [J]. ICWE 2021 WORKSHOPS, ICWE 2021 INTERNATIONAL WORKSHOPS, 2022, 1508 : 39 - 49
  • [44] Code generation
    Firth, Niall
    [J]. NEW SCIENTIST, 2014, 223 (2985) : 38 - 41
  • [45] Matching Subscription Over Geo-Textual Streams from IoT via Social-Aware Clustering and Apache Flink
    Huang, Xiaohui
    Deng, Ze
    Wang, Lizhe
    Liu, Tao
    Zhang, Chengyu
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (16)
  • [46] Scalable Taxonomy Generation and Evolution on Apache Spark
    Aalijah, Kanwal
    Irfan, Rabia
    [J]. 2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 634 - 639
  • [47] Text to Code: Pseudo Code Generation
    Din, Altaf U.
    Adnan, Awais
    [J]. CONTEXT-AWARE SYSTEMS AND APPLICATIONS, AND NATURE OF COMPUTATION AND COMMUNICATION, 2019, 298 : 20 - 37
  • [48] Apache Flink流式计算模型在数据处理中的应用与性能优化研究
    徐海霞
    [J]. 电脑知识与技术, 2024, 20 (07) : 71 - 73
  • [49] Testing the New Generation of Low-Frequency Current Comparators
    Satrapinski, A.
    Goetz, M.
    Pesel, E.
    Fletcher, N.
    Gournay, P.
    Rolland, B.
    [J]. 2016 CONFERENCE ON PRECISION ELECTROMAGNETIC MEASUREMENTS (CPEM 2016), 2016,
  • [50] On the Diffuseness of Code Technical Debt in Java']Java Projects of the Apache Ecosystem
    Saarimaki, Nyyti
    Lenarduzzi, Valentina
    Taibi, Davide
    [J]. 2019 IEEE/ACM INTERNATIONAL CONFERENCE ON TECHNICAL DEBT (TECHDEBT 2019), 2019, : 98 - 107