Code Generation in Serializers and Comparators of Apache Flink

被引:2
|
作者
Horvath, Gabor [1 ]
Pataki, Norbert [1 ]
Balassi, Marton [2 ]
机构
[1] Eotvos Lorand Univ, Dept Programming Languages & Compilers, Fac Informat, Budapest, Hungary
[2] Hungarian Acad Sci, Informat Lab, Inst Comp Sci & Control, Budapest, Hungary
关键词
!text type='Java']Java[!/text; Janino; code generation; big data; Flink;
D O I
10.1145/3098572.3098579
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There is a shift in the Big Data world. Applications used to be I/O bound. InfiniBand, SSDs reduced the I/O overhead and more sophisticated algorithms were developed. CPU became a bottleneck for some applications. Using state of the art CPUs, reduced CPU usage can lead to reduced electricity costs even when an application is I/O bound. Apache Flink is an open source framework for processing streams of data and batch jobs. It is using serialization for wide variety of purposes. Not only for sending data over the network, saving it to the hard disk, or for fault tolerance, but also some of the operators can work on the serialized representation of the data instead of Java objects. This approach can improve the performance significantly. Flink has a custom serialization method that enables operators to work on the serialized formats. Currently, Apache Flink uses reflection to serialize Plain Old Java Objects (POJOs). Reflection in Java is notoriously slow. Moreover, the structure of the code is harder to optimize for the JIT compiler. As a Google Summer of Code project in 2016, we implemented code generation for serializers and comparators for POJOs to improve the performance of Apache Flink. Flink has a delicate type system which provides us with lots of information about the types that need to be serialized. Using this information it is possible to generate specialized code with great performance. We achieved more than 6X performance improvement in the serialization which was a 20% overall improvement.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] The Tentative Research of Hydrological IoT Data Processing System Based on Apache Flink
    Ye, Feng
    Zhang, Peng
    Hu, Cheng
    Zhu, Songjie
    Li, Ling
    [J]. SERVICE-ORIENTED COMPUTING, ICSOC 2018, 2019, 11434 : 161 - 168
  • [32] Apache Flink复杂事件处理语言的形式语义
    傅宣登
    吴志林
    [J]. 软件学报., 2024, 35 (10) - 4532
  • [33] A Parallel Text Recognition in Electrical Equipment Nameplate Images Based on Apache Flink
    Liu, Zhen
    Li, Lin
    Zhang, Da
    Liu, Liangshuai
    Deng, Ze
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (07)
  • [34] Apache Flink and clustering-based framework for fast anonymization of IoT stream data
    Sadeghi-Nasab, Alireza
    Ghaffarian, Hossein
    Rahmani, Mohsen
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 20
  • [35] 基于Apache Flink数据仓库的元数据管理
    谭巍
    [J]. 金融科技时代, 2022, (08) : 72 - 75
  • [36] Implementing Dictionary Learning in Apache Flink, Or: How I Learned to Relax and Love Iterations
    Mon, Geoffrey
    Makkie, Milad
    Li, Xiang
    Liu, Tianming
    Quinn, Shannon
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2363 - 2367
  • [37] Exploiting Apache Flink's Iteration Capabilities for Distributed Apriori: Community Detection Problem as an example
    Rathee, Sanjay
    Kashyap, Arti
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 739 - 745
  • [38] Watermarks in Stream Processing Systems: Semantics and Comparative Analysis of Apache Flink and Google Cloud Dataflow
    Akidau, Tyler
    Begoli, Edmon
    Chernyak, Slava
    Hueske, Fabian
    Knight, Kathryn
    Knowles, Kenneth
    Mills, Daniel
    Sotolongo, Dan
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 3135 - 3147
  • [39] An efficient architecture for processing real-time traffic data streams using apache flink
    Deepthi, B. Gnana
    Rani, K. Sandhya
    Krishna, P. Venkata
    Saritha, V.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37369 - 37385
  • [40] An efficient architecture for processing real-time traffic data streams using apache flink
    B. Gnana Deepthi
    K. Sandhya Rani
    P. Venkata Krishna
    V. Saritha
    [J]. Multimedia Tools and Applications, 2024, 83 : 37369 - 37385