Unsupervised Noise Detection in Unstructured data for Automatic Parsing

被引:4
|
作者
Jain, Shubham [1 ]
de Buitleir, Amy [2 ]
Fallon, Enda [1 ]
机构
[1] Athlone Inst Technol, Software Res Inst, Athlone, Ireland
[2] Ericsson, Network Management Lab, Athlone, Ireland
关键词
Unsupervised Data Mining; Information Extraction; Clustering; Similarity;
D O I
10.23919/cnsm50824.2020.9269096
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The telecommunications industry makes extensive use of data extracted from logs, alarms, traces, diagnostics, and other monitoring devices. Analyzing the generated data requires that the data be parsed, re-structured, and re-formatted. Developing custom parsers for each input format is labor-intensive and requires domain knowledge. In this paper, we describe a novel unsupervised text processing pipeline to automatically detect and label relevant data and eliminate noise using Levenshtein similarity and Agglomerative clustering. We experiment with different similarity and clustering algorithms on a selection of common data formats to verify the accuracy of the proposed technique. The results suggest that the proposed methodology has higher accuracy.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Towards an Automatic Analyze and Standardization of Unstructured Data in the Context of Big and Linked Data
    Fadili, Hammou
    Jouis, Christophe
    [J]. PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES 2016), 2016, : 223 - 230
  • [42] On the Role of Supervision in Unsupervised Constituency Parsing
    Shi, Haoyue
    Livescu, Karen
    Gimpel, Kevin
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7611 - 7621
  • [43] Unsupervised dependency parsing without training
    Sogaard, Anders
    [J]. NATURAL LANGUAGE ENGINEERING, 2012, 18 : 187 - 203
  • [44] Unsupervised outlier detection in multidimensional data
    Atiq ur Rehman
    Samir Brahim Belhaouari
    [J]. Journal of Big Data, 8
  • [45] AUTOMATIC DETECTION OF MICROPHONE HANDLING NOISE
    Kendrick, Paul
    Cox, Trevor J.
    Li, Francis F.
    Fazenda, Bruno M.
    Jackson, Iain R.
    [J]. 2014 4TH INTERNATIONAL WORKSHOP ON COGNITIVE INFORMATION PROCESSING (CIP), 2014,
  • [46] Unsupervised Anomaly Detection in Transactional Data
    Bouguessa, Mohamed
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 526 - 531
  • [47] Unsupervised outlier detection in multidimensional data
    Ur Rehman, Atiq
    Belhaouari, Samir Brahim
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [48] Unsupervised Structure Detection in Biomedical Data
    Vogt, Julia E.
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 753 - 760
  • [49] UNSUPERVISED RIVER DETECTION IN RAPIDEYE DATA
    Klemenjak, Sascha
    Waske, Bjoern
    Valero, Silvia
    Chanussot, Jocelyn
    [J]. 2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 6860 - 6863
  • [50] Automatic Detection of Nominal Events in Hungarian Texts with Dependency Parsing and WordNet
    Subecz, Zoltan
    [J]. INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2016, 2016, 639 : 580 - 592