Research on Real-time Processing and Stream Analysis of Unstructured Data Based on Big Data Platforms

被引:0
|
作者
Liang, Huichao [1 ]
Wang, Di [1 ]
Liu, Yuan [1 ]
Mei, Lin [1 ]
Zhou, Mengxue [1 ]
Zhao, Haibin [1 ]
机构
[1] State Grid Henan, Informat & Telecommun Co Data Ctr, Zhengzhou 450000, Henan, Peoples R China
关键词
Big Data Platform; Unstructured Data; Real-time Processing; Stream Data;
D O I
10.1145/3662739.3665984
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rise of the big data era, massive streams of data are emerging in various fields such as the internet, Internet of Things, and finance, posing significant challenges to real-time processing. These data exhibit characteristics such as high velocity, randomness, and disorderliness, making it imperative to ensure the timeliness, stability, and correctness of systems. Therefore, this paper focuses on the stream processing of unstructured massive data on big data platforms and constructs a stream data capture and analysis architecture based on distributed processing mechanisms. Firstly, this paper designs a distributed stream data processing system based on network data capture, specifically targeting high-speed data processing requirements. By utilizing Pcap network data capture technology in conjunction with distributed computing systems, real-time high-speed data parsing is achieved. Meanwhile, this paper conducts a comparative analysis of two common distributed frameworks, establishing frameworks that support high-speed data storage and stream computing, thereby avoiding the problems of insufficient single-machine computing capacity and high costs of large-scale computers. Secondly, in the case where data from different business systems have different protocol contents, this paper proposes a generic protocol parsing method. By defining template writing rules and constructing corresponding protocol parsing template files, parsing of generic protocols is achieved. Through experimental validation, this paper verifies the feasibility of the proposed methods, providing important references for the real-time processing and stream analysis of unstructured data on big data platforms.
引用
下载
收藏
页码:96 / 101
页数:6
相关论文
共 50 条
  • [21] Big Data Streaming Platforms to Support Real-time Analytics
    Fernandes, Eliana
    Salgado, Ana Carolina
    Bernardino, Jorge
    ICSOFT: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2020, : 426 - 433
  • [22] A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring
    Akanbi, Adeyinka
    Masinde, Muthoni
    SENSORS, 2020, 20 (11) : 1 - 25
  • [23] Stream processing platforms for analyzing big dynamic data
    Hagedorn, Stefan
    Goetze, Philipp
    Saleh, Omran
    Sattler, Kai-Uwe
    IT-INFORMATION TECHNOLOGY, 2016, 58 (04): : 195 - 205
  • [24] Big Data Stream Computing in Healthcare Real-Time Analytics
    Ta, Van-Dai
    Liu, Chuan-Ming
    Nkabinde, Goodwill Wandile
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 37 - 42
  • [25] Survey of Real-time Processing Systems for Big Data
    Liu, Xiufeng
    Iftikhar, Nadeem
    Xie, Xike
    PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14), 2014, : 356 - 361
  • [26] Workflow Transformation for Real-Time Big Data Processing
    Ishizuka, Yuji
    Chen, Wuhui
    Paik, Incheon
    2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 315 - 318
  • [27] Processing of real-time data in big manufacturing systems
    Benesch, Manfred
    Kubin, Hellmuth
    Kabitzsch, Klaus
    27TH INTERNATIONAL CONFERENCE ON FLEXIBLE AUTOMATION AND INTELLIGENT MANUFACTURING, FAIM2017, 2017, 11 : 2114 - 2122
  • [28] Platform for real-time data analysis and visualization based on Big Data methods
    Ferreira, Gabriel
    Alves, Paulo
    de Almeida, Simone
    PROCEEDINGS OF 2021 16TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2021), 2021,
  • [29] Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem
    M. Mazhar Rathore
    Hojae Son
    Awais Ahmad
    Anand Paul
    Gwanggil Jeon
    International Journal of Parallel Programming, 2018, 46 : 630 - 646
  • [30] Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem
    Rathore, M. Mazhar
    Son, Hojae
    Ahmad, Awais
    Paul, Anand
    Jeon, Gwanggil
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (03) : 630 - 646