Clustering workflow requirements using compression dissimilarity measure

被引:0
|
作者
Wei, Li [1 ]
Handley, John
Martin, Nathaniel
Sun, Tong
Keogh, Eamonn
机构
[1] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 92521 USA
[2] Xerox Corp, Stamford, CT 06902 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Xerox offers a bewildering array of printers and software configurations to satisfy the needs of production print shops. A configuration tool in the hands of sales analysts elicits requirements from customers and recommends a list of product configurations. This tool generates special question and answer case logs that provide useful historical data. Given the unusual semi-structured question and answer format, this data is not amenable to any standard document clustering method. We discovered that a hierarchical agglomerative approach using a compression-based dissimilarity measure (CDM) provided readily interpretable clusters. We compare this method empirically to two reasonable alternatives, latent semantic analysis and probabilistic latent semantic analysis, and conclude that CDM offers an accurate and easily implemented approach to validate and augment our configuration tool.
引用
收藏
页码:50 / 54
页数:5
相关论文
共 50 条
  • [41] Economies Clustering Using SOM-Based Dissimilarity
    Chudziak, Adam
    [J]. ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2016, 2016, 629 : 111 - 122
  • [42] An optimization algorithm for clustering using weighted dissimilarity measures
    Chan, EY
    Ching, WK
    Ng, MK
    Huang, JZ
    [J]. PATTERN RECOGNITION, 2004, 37 (05) : 943 - 952
  • [43] VECTOR DISSIMILARITY AND CLUSTERING
    LEFKOVITCH, LP
    [J]. MATHEMATICAL BIOSCIENCES, 1991, 104 (01) : 39 - 48
  • [44] Accurate Image Search Using the Contextual Dissimilarity Measure
    Jegou, Herve
    Schmid, Cordelia
    Harzallah, Hedi
    Verbeek, Jakob
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (01) : 2 - 11
  • [45] Motion vector outlier removal using dissimilarity measure
    Yildirim, Burak
    Ilgin, Hakki Alparslan
    [J]. DIGITAL SIGNAL PROCESSING, 2015, 46 : 1 - 9
  • [46] Relevant Gene Selection Using Normalized Cut Clustering with Maximal Compression Similarity Measure
    Bala, Rajni
    Agrawal, R. K.
    Sardana, Manju
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 81 - +
  • [47] Develop Load Shape Dictionary Through Efficient Clustering Based on Elastic Dissimilarity Measure
    Liang, Huishi
    Ma, Jin
    [J]. IEEE TRANSACTIONS ON SMART GRID, 2021, 12 (01) : 442 - 452
  • [48] Clustering clinical and health care processes using a novel measure of dissimilarity for variable-length sequences of ordinal states
    Johns, Hannah
    Hearne, John
    Bernhardt, Julie
    Churilov, Leonid
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2020, 29 (10) : 3059 - 3075
  • [49] A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure
    Thu-Hien Thi Nguyen
    Van-Nam Huynh
    [J]. FOUNDATIONS OF INFORMATION AND KNOWLEDGE SYSTEMS (FOIKS 2016), 2016, 9616 : 115 - 130
  • [50] A redundancy-based measure of dissimilarity among probability distributions for hierarchical clustering criteria
    Iwata, Kazunori
    Hayashi, Akira
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (01) : 76 - 88