Document representation based on probabilistic word clustering in customer-voice classification

被引:0
|
作者
Younghoon Lee
Seokmin Song
Sungzoon Cho
Jinhae Choi
机构
[1] Seoul National University,Department of Industrial Engineering and Institute for Industrial Systems Innovation
[2] LG Electronics,Data Driven User Experience Team, Mobile Communication Lab
来源
关键词
Probabilistic word clustering; Document representation; Customer-voice; Classification;
D O I
暂无
中图分类号
学科分类号
摘要
Customer-voice data have an important role in different fields including marketing, product planning, and quality assurance. However, owing to the manual processes involved, there are problems associated with the classification of customer-voice data. This study focuses on building automatic classifiers for customer-voice data with newly proposed document representation methods based on neural-embedding and probabilistic word-clustering approaches. Semantically similar terms are classified into a common cluster. The words generated from neural embedding are clustered according to the membership strength of each word relative to each cluster derived from a probabilistic clustering method such as the fuzzy C-means clustering method or Gaussian mixture model. It is expected that the proposed method can be suitable for the classification of customer-voice data consisting of unstructured text by considering the membership strength. The results demonstrate that the proposed method achieved an accuracy of 89.24% with respect to representational effectiveness and an accuracy of 87.76% with respect to the classification performance of customer-voice data consisting of 12 classes. Further, the method provided an intuitive interpretation for the generated representation.
引用
收藏
页码:221 / 232
页数:11
相关论文
共 50 条
  • [41] Hierarchical Clustering Model for Pixel-Based Classification of Document Images
    Vieux, Remi
    Domenger, Jean-Philippe
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 290 - 293
  • [42] A New Term Weighting Scheme Based on Class Specific Document Frequency for Document Representation and Classification
    Plansangket, Suthira
    Gan, John Q.
    2015 7TH COMPUTER SCIENCE AND ELECTRONIC ENGINEERING CONFERENCE (CEEC), 2015, : 5 - 8
  • [43] An improved ant algorithm with LDA-based representation for text document clustering
    Onan, Aytug
    Bulut, Hasan
    Korukoglu, Serdar
    JOURNAL OF INFORMATION SCIENCE, 2017, 43 (02) : 275 - 292
  • [44] Topic Classification Based on Distributed Document Representation and Latent Topic Information
    Chen, Peixin
    Guo, Wu
    Wang, Qingnan
    Song, Yan
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 614 - 617
  • [45] Wikipedia-Based Hybrid Document Representation for Textual News Classification
    Mourino Garcia, Marcos Antonio
    Perez Rodriguez, Roberto
    Anido Rifon, Luis
    Vilares Ferro, Manuel
    2016 3RD INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2016), 2016, : 148 - 153
  • [46] Short Text Classification using Wikipedia Concept based Document Representation
    Wang, Xiang
    Chen, Ruhua
    Jia, Yan
    Zhou, Bin
    2013 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA), 2013, : 471 - 474
  • [47] Wikipedia-based hybrid document representation for textual news classification
    Marcos Antonio Mouriño-García
    Roberto Pérez-Rodríguez
    Luis Anido-Rifón
    Manuel Vilares-Ferro
    Soft Computing, 2018, 22 : 6047 - 6065
  • [48] Wikipedia-based hybrid document representation for textual news classification
    Antonio Mourino-Garcia, Marcos
    Perez-Rodriguez, Roberto
    Anido-Rifon, Luis
    Vilares-Ferro, Manuel
    SOFT COMPUTING, 2018, 22 (18) : 6047 - 6065
  • [49] CNN based Sentence Classification with Semantic Features using Word Clustering
    Kim, Hwa-Yeon
    Lee, Jinsu
    Yeo, Na Young
    Astrid, Marcella
    Lee, Seung-Ik
    Kim, Young-Kil
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 484 - 488
  • [50] Word Embedding-based Web Service Representations for Classification and Clustering
    Zhang, Xiangping
    Liu, Jianxun
    Shi, Min
    Cao, Buqing
    2021 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2021), 2021, : 34 - 43