Unsupervised Noise Detection in Unstructured data for Automatic Parsing

被引:4
|
作者
Jain, Shubham [1 ]
de Buitleir, Amy [2 ]
Fallon, Enda [1 ]
机构
[1] Athlone Inst Technol, Software Res Inst, Athlone, Ireland
[2] Ericsson, Network Management Lab, Athlone, Ireland
关键词
Unsupervised Data Mining; Information Extraction; Clustering; Similarity;
D O I
10.23919/cnsm50824.2020.9269096
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The telecommunications industry makes extensive use of data extracted from logs, alarms, traces, diagnostics, and other monitoring devices. Analyzing the generated data requires that the data be parsed, re-structured, and re-formatted. Developing custom parsers for each input format is labor-intensive and requires domain knowledge. In this paper, we describe a novel unsupervised text processing pipeline to automatically detect and label relevant data and eliminate noise using Levenshtein similarity and Agglomerative clustering. We experiment with different similarity and clustering algorithms on a selection of common data formats to verify the accuracy of the proposed technique. The results suggest that the proposed methodology has higher accuracy.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] A Review of Unstructured Data Analysis and Parsing Methods
    Jain, Shubham
    de Buitleir, Amy
    Fallon, Enda
    [J]. 2020 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2020, : 164 - 169
  • [2] An Extensible Parsing Pipeline for Unstructured Data Processing
    Jain, Shubham
    de Buitleir, Amy
    Fallon, Enda
    [J]. 2021 23RD INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT 2021): ON-LINE SECURITY IN PANDEMIC ERA, 2021, : 312 - 318
  • [3] An Extensible Parsing Pipeline for Unstructured Data Processing
    Jain, Shubham
    de Buitleir, Amy
    Fallon, Enda
    [J]. 2022 24TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ARITIFLCIAL INTELLIGENCE TECHNOLOGIES TOWARD CYBERSECURITY, 2022, : 312 - +
  • [4] Automatic Unsupervised Polarity Detection on a Twitter Data Stream
    Terrana, Diego
    Augello, Agnese
    Pilato, Giovanni
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2014, : 128 - 134
  • [5] A Framework for Adaptive Deep Reinforcement Semantic Parsing of Unstructured Data
    Jain, Shubham
    de Buitleir, Amy
    Fallon, Enda
    [J]. 12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1055 - 1060
  • [6] Analysis and Parsing of Unstructured Cyber-Security Incident Data
    Ochoa, Armando J.
    Finlayson, Mark A.
    [J]. PROCEEDINGS OF THE 2019 CONFERENCE ON SECURITY AND PRIVACY IN WIRELESS AND MOBILE NETWORKS (WISEC '19), 2019, : 345 - 346
  • [7] Data-Efficient Automatic Model Selection in Unsupervised Anomaly Detection
    Gudur, Gautham Krishna
    Raaghul, R.
    Adithya, K.
    Vasudevan, Shrihari
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1443 - 1448
  • [8] Automatic Detection of Building Displacements Through Unsupervised Learning From InSAR Data
    Kuzu, Rdvan Salih
    Bagaglini, Leonardo
    Wang, Yi
    Dumitru, Corneliu Octavian
    Braham, Nassim Ait Ali
    Pasquali, Giorgio
    Santarelli, Filippo
    Trillo, Francesco
    Saha, Sudipan
    Zhu, Xiao Xiang
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 6931 - 6947
  • [9] LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs
    Meng, Weibin
    Liu, Ying
    Zhu, Yichen
    Zhang, Shenglin
    Pei, Dan
    Liu, Yuqing
    Chen, Yihao
    Zhang, Ruizhi
    Tao, Shimin
    Sun, Pei
    Zhou, Rong
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4739 - 4745
  • [10] Improvements to Dependency Parsing Using Automatic Simplification of Data
    Jelinek, Tomas
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,