Unsupervised discretization by two-dimensional MDL-based histogram

被引:0
|
作者
Lincen Yang
Mitra Baratchi
Matthijs van Leeuwen
机构
[1] Leiden University,Leiden Institute of Advanced Computer Science
来源
Machine Learning | 2023年 / 112卷
关键词
Unsupervised discretization; Histogram model; Density estimation; Exploratory data analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectangular cells of adaptive size. Unfortunately, this approach is unable to adequately characterize dependencies among dimensions and/or results in discretizations consisting of more cells (or bins) than is desirable. To address this problem, we propose an expressive model class that allows for far more flexible partitions of two-dimensional data. We extend the state of the art for the one-dimensional case to obtain a model selection problem based on the normalized maximum likelihood, a form of refined MDL. As the flexibility of our model class comes at the cost of a vast search space, we introduce a heuristic algorithm, named PALM, which partitions each dimension alternately and then merges neighboring regions, all using the MDL principle. Experiments on synthetic data show that PALM (1) accurately reveals ground truth partitions that are within the model class (i.e., the search space), given a large enough sample size; (2) approximates well a wide range of partitions outside the model class; (3) converges, in contrast to the state-of-the-art multivariate discretization method IPD. Finally, we apply our algorithm to three spatial datasets, and we demonstrate that, compared to kernel density estimation (KDE), our algorithm not only reveals more detailed density changes, but also fits unseen data better, as measured by the log-likelihood.
引用
收藏
页码:2397 / 2431
页数:34
相关论文
共 50 条
  • [21] Thresholding Using Two-Dimensional Histogram Based on Local Entropy
    Cheng, Zhao
    Zhang, Tianxu
    Yan, Luxin
    [J]. 2010 2ND INTERNATIONAL ASIA CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS (CAR 2010), VOL 1, 2010, : 357 - 360
  • [22] An Improved Image Thresholding Method Based On Two-Dimensional Histogram
    Zhang, Jun.
    Liao, Yixin.
    Yan, Lifei
    [J]. 2018 13TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2018, : 1623 - 1628
  • [23] Design of vector quantization networks by MDL-based principles
    Bischof, H
    Leonardis, A
    [J]. IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2294 - 2299
  • [24] MDLChunker: A MDL-Based Cognitive Model of Inductive Learning
    Robinet, Vivien
    Lemaire, Benoit
    Gordon, Mirta B.
    [J]. COGNITIVE SCIENCE, 2011, 35 (07) : 1352 - 1389
  • [25] Two-dimensional histogram specification using two-dimensional cumulative distribution function
    Jung, Seung-Won
    [J]. ELECTRONICS LETTERS, 2014, 50 (12) : 872 - 873
  • [26] The two-dimensional histogram as histogram as a constraint for protein phase improvement
    Goldstein, A
    Zhang, KYJ
    [J]. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 1998, 54 : 1230 - 1244
  • [27] An MDL-Based Wavelet Scattering Features Selection for Signal Classification
    Bruni, Vittoria
    Cardinali, Maria Lucia
    Vitulano, Domenico
    [J]. AXIOMS, 2022, 11 (08)
  • [28] A Robust Tracking Method Based on Two-Dimensional Histogram and Generalized Divergence
    Li, Wei
    Yang, KaiPeng
    Chen, JiaXin
    Ji, ZhiHang
    Ling, Ling
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL, VOLS 1 AND 2, 2009, : 659 - 664
  • [29] A Turtle Shell based RDH scheme with two-dimensional histogram shifting
    Xiao-Zhu Xie
    Chin-Chen Chang
    Chia-Chen Lin
    Jia-Long Lin
    [J]. Multimedia Tools and Applications, 2019, 78 : 19413 - 19436
  • [30] An MDL-based Multi-task Classification and Reconstruction Algorithm
    Wang, Ying-Gui
    Liu, Zheng
    Feng, Dao-Wang
    Jiang, Wen-Li
    [J]. 2013 16TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2013, : 212 - 218