Heuristically mining the top-k high-utility itemsets with cross-entropy optimization

被引:0
|
作者
Wei Song
Chuanlong Zheng
Chaomin Huang
Lu Liu
机构
[1] North China University of Technology,School of Information Science and Technology
来源
Applied Intelligence | 2022年 / 52卷
关键词
Top-k high-utility itemset; Cross-entropy; Critical utility value; Sample refinement; Smoothing mutation; Bit edit distance;
D O I
暂无
中图分类号
学科分类号
摘要
Mining high-utility itemsets (HUIs) is one of the most important research topics in data mining because HUIs consider non-binary frequency values of items in transactions and different profit values for each item. However, setting appropriate minimum utility thresholds by trial and error is a tedious process for users. Thus, mining the top-k HUIs without setting a utility threshold is becoming an alternative to determine all the HUIs. In this paper, we propose two algorithms, called the top-k high-utility itemset mining based on cross-entropy method (TKU-CE) and TKU-CE+, for mining the top-k HUIs heuristically. The TKU-CE algorithm is based on cross-entropy, and implements top-k HUI mining using combinatorial optimization. The main idea of TKU-CE is to generate the top-k HUIs by gradually updating the probabilities of itemsets with high-utility values. TKU-CE+ optimizes TKU-CE in three respects. First, unpromising items are filtered by critical utility value, to reduce the computational burden in the initial stage. Second, a sample refinement strategy is used in each iteration, to reduce the computational burden in the iterative stage. Finally, smoothing mutation is proposed, to randomly generate some new itemsets in addition to those from previous iterations. Consequently, diversity of samples is improved, so that more actual top-k HUIs can be discovered with fewer iterations. Compared with state-of-the-art algorithms, TKU-CE and TKU-CE+ are easy to implement and avoid the computational costs that would be incurred by additional data structures and threshold-raising strategies. Extensive experimental results show that both algorithms are efficient, memory-saving, scalable, and can discover the most actual top-k HUIs.
引用
收藏
页码:17026 / 17041
页数:15
相关论文
共 50 条
  • [1] Heuristically mining the top-k high-utility itemsets with cross-entropy optimization
    Song, Wei
    Zheng, Chuanlong
    Huang, Chaomin
    Liu, Lu
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 17026 - 17041
  • [2] Mining Top-k Regular High-Utility Itemsets in Transactional Databases
    Kumari, P. Lalitha
    Sanjeevi, S. G.
    Rao, T. V. Madhusudhana
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2019, 15 (01) : 58 - 79
  • [3] FTKHUIM: A Fast and Efficient Method for Mining Top-K High-Utility Itemsets
    Vu, Vinh V.
    Lam, Mi T. H.
    Duong, Thuy T. M.
    Manh, Ly T.
    Nguyen, Thuy T. T.
    Nguyen, Le V.
    Yun, Unil
    Snasel, Vaclav
    Vo, Bay
    [J]. IEEE ACCESS, 2023, 11 : 104789 - 104805
  • [4] Mining Top-K constrained cross-level high-utility itemsets over data streams
    Meng Han
    Shujuan Liu
    Zhihui Gao
    Dongliang Mu
    Ang Li
    [J]. Knowledge and Information Systems, 2024, 66 : 2885 - 2924
  • [5] Mining Top-K constrained cross-level high-utility itemsets over data streams
    Han, Meng
    Liu, Shujuan
    Gao, Zhihui
    Mu, Dongliang
    Li, Ang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (05) : 2885 - 2924
  • [6] Mining of top-k high utility itemsets with negative utility
    Sun, Rui
    Han, Meng
    Zhang, Chunyan
    Shen, Mingyao
    Du, Shiyu
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (03) : 5637 - 5652
  • [7] Targeted mining of top-k high utility itemsets
    Huang, Shan
    Gan, Wensheng
    Miao, Jinbao
    Han, Xuming
    Fournier-Viger, Philippe
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [8] TKC: Mining Top-K Cross-Level High Utility Itemsets
    Nouioua, Mourad
    Wang, Ying
    Fournier-Viger, Philippe
    Lin, Jerry Chun-Wei
    Wu, Jimmy Ming-Tai
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 673 - 682
  • [9] TopHUI: Top-k high-utility itemset mining with negative utility
    Gan, Wensheng
    Wan, Shicheng
    Chen, Jiahui
    Chen, Chien-Ming
    Qiu, Lina
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5350 - 5359
  • [10] Mining top-k high-utility itemsets from a data stream under sliding window model
    Siddharth Dawar
    Veronica Sharma
    Vikram Goyal
    [J]. Applied Intelligence, 2017, 47 : 1240 - 1255