Research on Real-time Processing and Stream Analysis of Unstructured Data Based on Big Data Platforms

被引:0
|
作者
Liang, Huichao [1 ]
Wang, Di [1 ]
Liu, Yuan [1 ]
Mei, Lin [1 ]
Zhou, Mengxue [1 ]
Zhao, Haibin [1 ]
机构
[1] State Grid Henan, Informat & Telecommun Co Data Ctr, Zhengzhou 450000, Henan, Peoples R China
关键词
Big Data Platform; Unstructured Data; Real-time Processing; Stream Data;
D O I
10.1145/3662739.3665984
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rise of the big data era, massive streams of data are emerging in various fields such as the internet, Internet of Things, and finance, posing significant challenges to real-time processing. These data exhibit characteristics such as high velocity, randomness, and disorderliness, making it imperative to ensure the timeliness, stability, and correctness of systems. Therefore, this paper focuses on the stream processing of unstructured massive data on big data platforms and constructs a stream data capture and analysis architecture based on distributed processing mechanisms. Firstly, this paper designs a distributed stream data processing system based on network data capture, specifically targeting high-speed data processing requirements. By utilizing Pcap network data capture technology in conjunction with distributed computing systems, real-time high-speed data parsing is achieved. Meanwhile, this paper conducts a comparative analysis of two common distributed frameworks, establishing frameworks that support high-speed data storage and stream computing, thereby avoiding the problems of insufficient single-machine computing capacity and high costs of large-scale computers. Secondly, in the case where data from different business systems have different protocol contents, this paper proposes a generic protocol parsing method. By defining template writing rules and constructing corresponding protocol parsing template files, parsing of generic protocols is achieved. Through experimental validation, this paper verifies the feasibility of the proposed methods, providing important references for the real-time processing and stream analysis of unstructured data on big data platforms.
引用
下载
收藏
页码:96 / 101
页数:6
相关论文
共 50 条
  • [1] Real-time stream processing for Big Data
    Wingerath, Wolfram
    Gessert, Felix
    Friedrich, Steffen
    Ritter, Norbert
    IT-INFORMATION TECHNOLOGY, 2016, 58 (04): : 186 - 194
  • [2] RUBA: Real-time Unstructured Big Data Analysis Framework
    Kim, Jaein
    Kim, Nacwoo
    Lee, Byungtak
    Park, Joonho
    Seo, Kwangik
    Park, Hunyoung
    2013 INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2013): FUTURE CREATIVE CONVERGENCE TECHNOLOGIES FOR NEW ICT ECOSYSTEMS, 2013, : 520 - 524
  • [3] A survey on data stream, big data and real-time
    Gomes E.H.A.
    Plentz P.D.M.
    De Rolt C.R.
    Dantas M.A.R.
    International Journal of Networking and Virtual Organisations, 2019, 20 (02) : 143 - 167
  • [4] Architectural Design Of Data Stream-Based Big Data Real-Time Analysis System
    Liu, Qiang
    Lv, Junmin
    Yuan, Xun
    Luo, Renyi
    Lv, Dekui
    PROCEEDINGS OF THE 2017 2ND JOINT INTERNATIONAL INFORMATION TECHNOLOGY, MECHANICAL AND ELECTRONIC ENGINEERING CONFERENCE (JIMEC 2017), 2017, 62 : 153 - 156
  • [5] Big Data Real-time Processing Based on Storm
    Yang, Wenjie
    Liu, Xingang
    Zhang, Lan
    Yang, Laurence T.
    2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1784 - 1787
  • [6] Application Research of Energy Data Acquisition and Analysis Based on Real-time Stream Processing Platform
    Li, Kunming
    Ji, Cong
    Zhong, Chunlin
    Zheng, Fei
    Shao, Jun
    PROCEEDINGS OF 2017 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2017), 2017, : 175 - 178
  • [7] BigSR: real-time expressive RDF stream reasoning on modern Big Data platforms
    Ren, Xiangnan
    Cure, Olivier
    Naacke, Hubert
    Xiao, Guohui
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 811 - 820
  • [8] Stream Processing For Near Real-Time Scientific Data Analysis
    Choi, Jong Youl
    Kurc, Tahsin
    Logan, Jeremy
    Wolf, Matthew
    Suchyta, Eric
    Kress, James
    Pugmire, David
    Podhorszki, Norbert
    Byun, Eun-Kyu
    Ainsworth, Mark
    Pwashar, Manish
    Klasky, Scott
    2016 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS), 2016,
  • [9] A review on big data real-time stream processing and its scheduling techniques
    Tantalaki, Nicoleta
    Souravlas, Stavros
    Roumeliotis, Manos
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2020, 35 (05) : 571 - 601
  • [10] Near Real-Time Big Data Stream Processing Platform Using Cassandra
    Pal, Gautam
    Li, Gangmin
    Atkinson, Katie
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,