Supervised Evaluation of Top-k Itemset Mining Algorithms

被引:1
|
作者
Lucchese, Claudio [1 ]
Orlando, Salvatore [2 ]
Perego, Raffaele [1 ]
机构
[1] CNR, ISTI, I-56100 Pisa, Italy
[2] Univ Ca Foscari, DAIS, Venice, Italy
关键词
APPROXIMATE;
D O I
10.1007/978-3-319-22729-0_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, e.g., the accuracy of the data description. In this work, we review several greedy state-of-the-art algorithms, namely Asso, Hyper+, and PaNDa(+), and propose a methodology to compare the patterns extracted. In evaluating the set of mined patterns, we aim at overcoming the usual assessment methodology, which only measures the given cost function to minimize. Thus, we evaluate how good are the models/patterns extracted in unveiling supervised knowledge on the data. To this end, we test algorithms and diverse cost functions on several datasets from the UCI repository. As contribution, we show that PaNDa(+) performs best in the majority of the cases, since the classifiers built over the mined patterns used as dataset features are the most accurate.
引用
收藏
页码:82 / 94
页数:13
相关论文
共 50 条
  • [1] TKQ: Top-K Quantitative High Utility Itemset Mining
    Nouioua, Mourad
    Fournier-Viger, Philippe
    Gan, Wensheng
    Wu, Youxi
    Lin, Jerry Chun-Wei
    Nouioua, Farid
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2021, PT II, 2022, 13088 : 16 - 28
  • [2] Efficient Top-k Frequent Itemset Mining on Massive Data
    Wan, Xiaolong
    Han, Xixian
    [J]. DATA SCIENCE AND ENGINEERING, 2024, 9 (02) : 177 - 203
  • [3] Efficient top-k high utility itemset mining on massive data
    Han, Xixian
    Liu, Xianmin
    Li, Jianzhong
    Gao, Hong
    [J]. INFORMATION SCIENCES, 2021, 557 : 382 - 406
  • [4] TopHUI: Top-k high-utility itemset mining with negative utility
    Gan, Wensheng
    Wan, Shicheng
    Chen, Jiahui
    Chen, Chien-Ming
    Qiu, Lina
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5350 - 5359
  • [5] TKFIM: Top-K frequent itemset mining technique based on equivalence classes
    Iqbal, Saood
    Shahid, Abdul
    Roman, Muhammad
    Khan, Zahid
    Al-Otaibi, Shaha
    Yu, Lisu
    [J]. PEERJ COMPUTER SCIENCE, 2021, : 1 - 27
  • [6] Crowdsourced Top-k Algorithms: An Experimental Evaluation
    Zhang, Xiaohang
    Li, Guoliang
    Feng, Jianhua
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (08): : 612 - 623
  • [7] An effective scheme for top-k frequent itemset mining under differential privacy conditions
    Wenjuan LIANG
    Hong CHEN
    Jing ZHANG
    Dan ZHAO
    Cuiping LI
    [J]. Science China(Information Sciences), 2020, 63 (05) : 200 - 202
  • [8] Top-k High Utility Itemset Mining Based on Utility-List Structures
    Lee, Serin
    Park, Jong Soo
    [J]. 2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 101 - 108
  • [9] RETRACTED: Mining Top-K frequent closed itemset in data streams (Retracted Article)
    Li, Jun
    Hou, Xiuhong
    Gong, Sen
    [J]. 2011 INTERNATIONAL CONFERENCE ON ENERGY AND ENVIRONMENTAL SCIENCE-ICEES 2011, 2011, 11
  • [10] An effective scheme for top-k frequent itemset mining under differential privacy conditions
    Liang, Wenjuan
    Chen, Hong
    Zhang, Jing
    Zhao, Dan
    Li, Cuiping
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (05)