Improving the Accuracy and Efficiency of Compression-based Dissimilarity Measure using Information Quantity in Data Classification Problems

被引:0
|
作者
Takamoto, Ayaka [1 ]
Kohara, Yuto [2 ]
Yoshida, Mitsuo [2 ]
Umemura, Kyoji [1 ]
机构
[1] Department of Computer Science and Engineering, Toyohashi University of Technology, Japan
[2] Faculty of Business Sciences, University of Tsukuba, Japan
关键词
Artificial intelligence;
D O I
10.1527/tjsai.38-1_A-M71
中图分类号
学科分类号
摘要
Compression-based Dissimilarity Measure (CDM) is reported to work well in classifying strings without clues. However, CDM depends on the compression program, and its theoretical background is unclear. In this paper, we propose to replace CDM with the computation of information quantity. Since CDM only uses compressed size, our approach uses the value of information quantity of maximum probability partitioning of string instead of file size. We find this approach is more effective. Then, CDM and the proposed method were applied to publicly available time series data. In addition to the careful implementation of computation using suffix arrays, we also find this approach more efficient. © 2023, Japanese Society for Artificial Intelligence. All rights reserved.
引用
收藏
页码:1 / 15
相关论文
共 50 条
  • [1] Text Classification Using Compression-Based Dissimilarity Measures
    Coutinho, David Pereira
    Figueiredo, Mario A. T.
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (05)
  • [2] A Compression-Based Dissimilarity Measure for Multi-task Clustering
    Nguyen Huy Thach
    Shao, Hao
    Tong, Bin
    Suzuki, Einoshin
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, 2011, 6804 : 123 - 132
  • [3] Comparing Medical Code Usage With the Compression-Based Dissimilarity Measure
    Rost, Thomas Brox
    Edsberg, Ole
    Grimsmo, Anders
    Nytro, Oystein
    [J]. MEDINFO 2007: PROCEEDINGS OF THE 12TH WORLD CONGRESS ON HEALTH (MEDICAL) INFORMATICS, PTS 1 AND 2: BUILDING SUSTAINABLE HEALTH SYSTEMS, 2007, 129 : 684 - +
  • [4] Supervised Texture Classification Using a Novel Compression-Based Similarity Measure
    Gangeh, Mehrdad J.
    Ghodsi, Ali
    Kamel, Mohamed S.
    [J]. COMPUTER VISION AND GRAPHICS, 2012, 7594 : 379 - 386
  • [5] Compression-Based Integral Prior Classification for Improving Steganalysis
    Monarev, Viktor
    Duplischev, Ilja
    Pestunov, Andrey
    [J]. INFORMATION AND COMMUNICATIONS SECURITY, ICICS 2016, 2016, 9977 : 134 - 144
  • [6] Improving Compression Based Dissimilarity Measure for Music Score Analysis
    Takamoto, Ayaka
    Umemura, Mayu
    Yoshida, Mitsuo
    Umemura, Kyoji
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS - CONCEPTS, THEORY AND APPLICATION (ICAICTA), 2016,
  • [7] Infant Cry Classification using Compression-based Similarity Metric
    Radoi, Anamaria
    Burileanu, Corneliu
    [J]. 2018 12TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2018, : 67 - 70
  • [8] DATA DISCOVERY USING LOSSLESS COMPRESSION-BASED SPARSE REPRESENTATION
    Sabeti, Elyas
    Song, Peter X. K.
    Hero, Alfred O., III
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5539 - 5543
  • [9] Feature-Based Classification of Archaeal Sequences Using Compression-Based Methods
    Silva, Jorge Miguel
    Pratas, Diogo
    Caetano, Tania
    Matos, Sergio
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022), 2022, 13256 : 309 - 320
  • [10] On Improving Dissimilarity-Based Classifications Using a Statistical Similarity Measure
    Kim, Sang-Woon
    Duin, Robert P. W.
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2010, 6419 : 418 - +