Railway Fault Text Clustering Method Using an Improved Dirichlet Multinomial Mixture Model

被引:1
|
作者
Yang, Ni [1 ]
Zhang, Youpeng [1 ]
机构
[1] Lanzhou Jiaotong Univ, Sch Automat & Elect Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1155/2022/7882396
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Railway signal equipment fault data (RSEFD) are one of the issues with in-depth traffic big data analysis throughout the life cycle of intelligent transportation. In the course of daily operation and maintenance, the railway electrical maintenance department records equipment malfunction information in a natural language. The data have the characteristics of strong professionalism, short text, unbalanced category, and low efficiency of manual analysis and processing. How to effectively mine the information contained in these fault texts to provide help for on-site operation and maintenance plays an important role. Therefore, we propose a railway fault text clustering method using an improved Dirichlet multinomial mixture model called ICH-GSDMM. In this method, first, the railway signal terminology thesaurus is established to overcome the inaccurate problem of RSEFD segmentation. Second, the traditional Chi square statistics is improved to overcome the learning difficulties caused by the imbalance of RSEFD. Finally, the Gibbs sampling algorithm for Dirichlet multinomial mixture model (GSDMM) is modified using an improved chi-square statistical method (ICH) to overcome the symmetry problem of the word Dirichlet prior parameters in the traditional GSDMM. Compared to the traditional GSDMM model and the GSDMM model based on chi-square statistics (CH-GSDMM), the quantitative experimental results show that the GSDMM model based on improved chi-square statistics (ICH-GSDMM internal)'s evaluation index of clustering performance has greatly improved, and its external evaluation indices are also the best, with the exception of external index NMI of data set DS2. Simultaneously, the diagnostic accuracy of a select few categories in RSEFD has considerably improved, demonstrating its efficacy.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] An Adaptive Dirichlet Multinomial Mixture Model for Short Text Streaming Clustering
    Duan, Ruting
    Li, Chunping
    [J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 49 - 55
  • [2] A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
    Yin, Jianhua
    Wang, Jianyong
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 233 - 242
  • [3] Multinomial mixture model with feature selection for text clustering
    Li, Minqiang
    Zhang, Liang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2008, 21 (07) : 704 - 708
  • [4] Inference and evaluation of the multinomial mixture model for text clustering
    Rigouste, Lois
    Cappe, Olivier
    Yvon, Francois
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (05) : 1260 - 1280
  • [5] A comparison of the performance of latent Dirichlet allocation and the Dirichlet multinomial mixture model on short text
    Mazarura, Jocelyn
    de Waal, Alta
    [J]. 2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
  • [6] Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clustering
    Alsmadi, Mutasem K.
    Alzaqebah, Malek
    Jawarneh, Sana
    Almarashdeh, Ibrahim
    Al-Betar, Mohammed Azmi
    Alwohaibi, Maram
    Al-Mulla, Noha A.
    Ahmed, Eman A. E.
    AL Smadi, Ahmad
    [J]. JOURNAL OF BIG DATA, 2024, 11 (01)
  • [7] Tensor Dirichlet Process Multinomial Mixture Model with Graphs for Passenger Trajectory Clustering
    Li, Ziyue
    Yan, Hao
    Zhang, Chen
    Ketter, Wolfgang
    Tsung, Fugee
    [J]. PROCEEDINGS OF THE 6TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON AI FOR GEOGRAPHIC KNOWLEDGE DISCOVERY, GEOAI 2023, 2023, : 121 - 128
  • [8] ChromDMM: a Dirichlet-multinomial mixture model for clustering heterogeneous epigenetic data
    Osmala, Maria
    Eraslan, Gokcen
    Lahdesmaki, Harri
    [J]. BIOINFORMATICS, 2022, 38 (16) : 3863 - 3870
  • [9] Evaluation of the Dirichlet Process Multinomial Mixture Model for Short-Text Topic Modeling
    Karlsson, Alexander
    Duarte, Denio
    Mathiason, Gunnar
    Bae, Juhee
    [J]. 2018 6TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI 2018), 2018, : 79 - 83
  • [10] Integrating semantic similarity with Dirichlet multinomial mixture model for enhanced web service clustering
    Neha Agarwal
    Geeta Sikka
    Lalit Kumar Awasthi
    [J]. Knowledge and Information Systems, 2024, 66 : 2327 - 2353