Interactive visual data exploration with subjective feedback: an information-theoretic approach

被引:3
|
作者
Puolamaki, Kai [1 ]
Oikarinen, Emilia [1 ]
Kang, Bo [2 ]
Lijffijt, Jefrey [2 ]
De Bie, Tijl [2 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[2] Univ Ghent, Dept Elect & Informat Syst, IDLab, Ghent, Belgium
关键词
Exploratory data analysis; Dimensionality reduction; Information theory; Subjective interestingness; Maximum entropy distribution; NONLINEAR DIMENSIONALITY REDUCTION; FIT;
D O I
10.1007/s10618-019-00655-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing projection methods for data visualization use predefined criteria to choose the representation of data. There is a lack of methods that (i) use information on what the user has learned from the data and (ii) show patterns that she does not know yet. We construct a theoretical model where identified patterns can be input as knowledge to the system. The knowledge syntax here is intuitive, such as "this set of points forms a cluster", and requires no knowledge of maths. This background knowledge is used to find a maximum entropy distribution of the data, after which the user is provided with data projections for which the data and the maximum entropy distribution differ the most, hence showing the user aspects of data that are maximally informative given the background knowledge. We study the computational performance of our model and present use cases on synthetic and real data. We find that the model allows the user to learn information efficiently from various data sources and works sufficiently fast in practice. In addition, we provide an open source EDA demonstrator system implementing our model with tailored interactive visualizations. We conclude that the information theoretic approach to EDA where patterns observed by a user are formalized as constraints provides a principled, intuitive, and efficient basis for constructing an EDA system.
引用
收藏
页码:21 / 49
页数:29
相关论文
共 50 条
  • [1] Interactive Visual Data Exploration with Subjective Feedback: An Information-Theoretic Approach
    Puolamaki, Kai
    Oikarinen, Emilia
    Kang, Bo
    Lijffijt, Jefrey
    De Bie, Tijl
    [J]. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1208 - 1211
  • [2] Interactive visual data exploration with subjective feedback: an information-theoretic approach
    Kai Puolamäki
    Emilia Oikarinen
    Bo Kang
    Jefrey Lijffijt
    Tijl De Bie
    [J]. Data Mining and Knowledge Discovery, 2020, 34 : 21 - 49
  • [3] A Constrained Randomization Approach to Interactive Visual Data Exploration with Subjective Feedback
    Kang, Bo
    Puolamaki, Kai
    Lijffijt, Jefrey
    De Bie, Tijl
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) : 1666 - 1679
  • [4] Information-theoretic approach to interactive learning
    Still, S.
    [J]. EPL, 2009, 85 (02)
  • [5] A Tool for Subjective and Interactive Visual Data Exploration
    Kang, Bo
    Puolamaki, Kai
    Lijffijt, Jefrey
    De Bie, Tijl
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III, 2016, 9853 : 3 - 7
  • [6] Information-Theoretic Private Interactive Mechanism
    Moraffah, Bahman
    Sankar, Lalitha
    [J]. 2015 53RD ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2015, : 911 - 918
  • [7] Information-theoretic active scene exploration
    Sommerlade, Eric
    Reid, Ian
    [J]. 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 1397 - 1403
  • [8] Information-Theoretic Exploration with Bayesian Optimization
    Bai, Shi
    Wang, Jinkun
    Chen, Fanfei
    Englot, Brendan
    [J]. 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), 2016, : 1816 - 1822
  • [9] Information-theoretic fuzzy approach to data reliability and data mining
    Maimon, O
    Kandel, A
    Last, M
    [J]. FUZZY SETS AND SYSTEMS, 2001, 117 (02) : 183 - 194
  • [10] An information-theoretic approach to hierarchical clustering of uncertain data
    Gullo, Francesco
    Ponti, Giovanni
    Tagarelli, Andrea
    Greco, Sergio
    [J]. INFORMATION SCIENCES, 2017, 402 : 199 - 215