Unsupervised Discovery of Co-occurrence in Sparse High Dimensional Data

被引:23
|
作者
Chum, Ondrej [1 ]
Matas, Jiri [1 ]
机构
[1] Czech Tech Univ, Fac Elec Eng, Dept Cybernet, CMP, CR-16635 Prague, Czech Republic
关键词
D O I
10.1109/CVPR.2010.5539997
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An efficient min-Hash based algorithm for discovery of dependencies in sparse high-dimensional data is presented. The dependencies are represented by sets of features co-occurring with high probability and are called co-ocsets. Sparse high dimensional descriptors, such as bag of words, have been proven very effective in the domain of image retrieval. To maintain high efficiency even for very large data collection, features are assumed independent. We show experimentally that co-ocsets are not rare, i.e. the independence assumption is often violated, and that they may ruin retrieval performance if present in the query image. Two methods for managing co-ocsets in such cases are proposed. Both methods significantly outperform the state-of-the-art in image retrieval, one is also significantly faster.
引用
收藏
页码:3416 / 3423
页数:8
相关论文
共 50 条
  • [1] Information theoretic clustering of sparse co-occurrence data
    Dhillon, IS
    Guan, YQ
    THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 517 - 520
  • [2] Unsupervised Heterogeneous Transfer Learning for Partial Co-occurrence Data
    Liu, Shuyu
    Yang, Liu
    Hu, Qinghua
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2021, 30 (03)
  • [3] EFFICIENT UNSUPERVISED MINING FROM NOISY CO-OCCURRENCE DATA
    Mamitsuka, Hiroshi
    NEW MATHEMATICS AND NATURAL COMPUTATION, 2005, 1 (01) : 173 - 193
  • [4] Interactive information bottleneck for high-dimensional co-occurrence data clustering
    Hu, Shizhe
    Wang, Ruobin
    Ye, Yangdong
    APPLIED SOFT COMPUTING, 2021, 111
  • [5] Unsupervised Multimodal Word Discovery Based on Double Articulation Analysis With Co-Occurrence Cues
    Taniguchi, Akira
    Murakami, Hiroaki
    Ozaki, Ryo
    Taniguchi, Tadahiro
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (04) : 1825 - 1840
  • [6] Supervised and Unsupervised Aspect Category Detection for Sentiment Analysis with Co-occurrence Data
    Schouten, Kim
    van der Weijde, Onne
    Frasincar, Flavius
    Dekker, Rommert
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (04) : 1263 - 1275
  • [7] Efficient unsupervised mining from noisy data sets: application to clustering co-occurrence data
    Mamitsuka, H
    PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2003, : 239 - 243
  • [8] Unsupervised Font Reconstruction Based on Token Co-occurrence
    Cutter, Michael P.
    van Beusekom, Joost
    Shafait, Faisal
    Breuel, Thomas M.
    DOCENG2010: PROCEEDINGS OF THE 2010 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2010, : 143 - 149
  • [9] Co-occurrence patterns in diagnostic data
    Piceno, Marie Ely
    Rodriguez-Navas, Laura
    Balcazar, Jose Luis
    COMPUTATIONAL INTELLIGENCE, 2021, 37 (04) : 1499 - 1514
  • [10] Euclidean embedding of co-occurrence data
    Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139
    不详
    不详
    不详
    不详
    J. Mach. Learn. Res., 2007, (2265-2295):