Selecting Representative Instances from Datasets

被引:0
|
作者
Mirisaee, Seyed Hamid [1 ]
Douzal, Ahlame [1 ]
Termier, Alexandre [2 ]
机构
[1] Univ Grenoble Alps, CNRS, Grenoble, France
[2] Univ Rennes 1, CNRS, INRIA, Rennes, France
关键词
ALGORITHMS; SUBSET;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose in this paper a new, alternative approach for the problem of finding a set of representative objects in large datasets. To do so, we first formulate the general Instance Selection Problem (ISP) and then study three variants of that in order to select instances from different regions of the data. These variants aim at finding the objects located in three very different locations of the data: the inner frontier, the central area and the outer frontier. Solutions to these problems have been discussed and their complexities have been studied. To illustrate the effectiveness of the proposed techniques, we first use a small, synthetic dataset for visualization purpose. We then study them on the Reuters dataset and show that the integration of instances selected by the ISP techniques is able to provide a good representation of the data and can be considered as a complementary approach for the state-of-the-art methods. Finally, we examine the quality of the selected objects by applying a topic-based analysis in order to show how well the selected documents cover the topics in the Reuters dataset.
引用
收藏
页码:291 / 300
页数:10
相关论文
共 50 条
  • [1] Method for selecting representative videos for change detection datasets
    Claudinei M. Silva
    Katharina A. I. Rosa
    Pedro H. Bugatti
    Priscila T. M. Saito
    Cléber G. Corrêa
    Roberto S. Yokoyama
    Silvio R. R. Sanches
    [J]. Multimedia Tools and Applications, 2022, 81 : 3773 - 3791
  • [2] Method for selecting representative videos for change detection datasets
    Silva, Claudinei M.
    Rosa, Katharina A., I
    Bugatti, Pedro H.
    Saito, Priscila T. M.
    Correa, Cleber G.
    Yokoyama, Roberto S.
    Sanches, Silvio R. R.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (03) : 3773 - 3791
  • [3] Selecting Representative Samples From Complex Biological Datasets Using K-Medoids Clustering
    Li, Lei
    Lan, Linda Yu-Ling
    Huang, Lei
    Ye, Congting
    Andrade, Jorge
    Wilson, Patrick C. C.
    [J]. FRONTIERS IN GENETICS, 2022, 13
  • [4] Identifying Mislabeled Instances in Classification Datasets
    Mueller, Nicolas M.
    Markert, Karla
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [5] Selecting training instances for supervised classification
    Roiger, R
    Cornell, L
    [J]. PROCEEDINGS ISAI/IFIS 1996 - MEXICO - USA COLLABORATION IN INTELLIGENT SYSTEMS TECHNOLOGIES, 1996, : 150 - 155
  • [6] CATEGORIZATION NORMS FOR 50 REPRESENTATIVE INSTANCES
    LOFTUS, EF
    SCHEFF, RW
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1971, 91 (02): : 355 - &
  • [7] Detecting representative trajectories from global AIS datasets
    Zygouras, Nikolas
    Spiliopoulos, Giannis
    Zissis, Dimitris
    [J]. 2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2278 - 2285
  • [8] UNIVERSAL AND REPRESENTATIVE INSTANCES USING UNMARKED NULLS
    JAJODIA, S
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1984, 181 : 367 - 378
  • [9] Micky: A Cheaper Alternative for Selecting Cloud Instances
    Hsu, Chin-Jung
    Nair, Vivek
    Menzies, Tim
    Freeh, Vincent
    [J]. PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, : 409 - 416
  • [10] Efficient intrusion detection using representative instances
    Guo, Chun
    Zhou, Ya-Jian
    Ping, Yuan
    Luo, Shou-Shan
    Lai, Yu-Ping
    Zhang, Zhong-Kun
    [J]. COMPUTERS & SECURITY, 2013, 39 : 255 - 267