Unsupervised Noise Detection in Unstructured data for Automatic Parsing

被引:4
|
作者
Jain, Shubham [1 ]
de Buitleir, Amy [2 ]
Fallon, Enda [1 ]
机构
[1] Athlone Inst Technol, Software Res Inst, Athlone, Ireland
[2] Ericsson, Network Management Lab, Athlone, Ireland
关键词
Unsupervised Data Mining; Information Extraction; Clustering; Similarity;
D O I
10.23919/cnsm50824.2020.9269096
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The telecommunications industry makes extensive use of data extracted from logs, alarms, traces, diagnostics, and other monitoring devices. Analyzing the generated data requires that the data be parsed, re-structured, and re-formatted. Developing custom parsers for each input format is labor-intensive and requires domain knowledge. In this paper, we describe a novel unsupervised text processing pipeline to automatically detect and label relevant data and eliminate noise using Levenshtein similarity and Agglomerative clustering. We experiment with different similarity and clustering algorithms on a selection of common data formats to verify the accuracy of the proposed technique. The results suggest that the proposed methodology has higher accuracy.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Automatic analysis of the difference image for unsupervised change detection
    Bruzzone, L
    Prieto, DF
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2000, 38 (03): : 1171 - 1182
  • [32] An Unsupervised Change Detection Based on Automatic Relationship Analysis
    Jia, Meng
    Huo, Lina
    Zhang, Runzhao
    [J]. PROCEEDINGS OF THE 2019 14TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2019), 2019, : 127 - 132
  • [33] Unsupervised Parsing via Constituency Tests
    Cao, Steven
    Kitaev, Nikita
    Klein, Dan
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4798 - 4808
  • [34] Robust Unsupervised Discriminative Dependency Parsing
    Jiang, Yong
    Cai, Jiong
    Tu, Kewei
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2020, 25 (02) : 192 - 202
  • [35] Unsupervised segmentation for automatic detection of brain tumors in MRI
    Capelle, AS
    Alata, O
    Fernandez, C
    Lefevre, S
    Ferrie, JC
    [J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2000, : 613 - 616
  • [36] Robust Unsupervised Discriminative Dependency Parsing
    Yong Jiang
    Jiong Cai
    Kewei Tu
    [J]. Tsinghua Science and Technology, 2020, 25 (02) : 192 - 202
  • [37] Word Segmentation as Unsupervised Constituency Parsing
    Alhama, Raquel G.
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4103 - 4112
  • [38] Automatic extraction of numerical values from unstructured data in EHRs
    Bigeard, Elise
    Jouhet, Vianney
    Mougin, Fleur
    Thiessard, Frantz
    Grabar, Natalia
    [J]. DIGITAL HEALTHCARE EMPOWERING EUROPEANS, 2015, 210 : 50 - 54
  • [39] Automatic trial eligibility surveillance based on unstructured clinical data
    Meystre, Stephane M.
    Heider, Paul M.
    Kim, Youngjun
    Aruch, Daniel B.
    Britten, Carolyn D.
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 : 13 - 19
  • [40] Specification Based Automatic Product Categorization From Unstructured Data
    Huseynli, Alisettar
    Yildiz, Oktay
    Akcayol, M. Ali
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,