Heuristically mining the top-k high-utility itemsets with cross-entropy optimization

被引：0

作者：

Wei Song

Chuanlong Zheng

Chaomin Huang

Lu Liu

机构：

[1] North China University of Technology,School of Information Science and Technology

来源：

Applied Intelligence | 2022年 / 52卷

关键词：

Top-k high-utility itemset; Cross-entropy; Critical utility value; Sample refinement; Smoothing mutation; Bit edit distance;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Mining high-utility itemsets (HUIs) is one of the most important research topics in data mining because HUIs consider non-binary frequency values of items in transactions and different profit values for each item. However, setting appropriate minimum utility thresholds by trial and error is a tedious process for users. Thus, mining the top-k HUIs without setting a utility threshold is becoming an alternative to determine all the HUIs. In this paper, we propose two algorithms, called the top-k high-utility itemset mining based on cross-entropy method (TKU-CE) and TKU-CE+, for mining the top-k HUIs heuristically. The TKU-CE algorithm is based on cross-entropy, and implements top-k HUI mining using combinatorial optimization. The main idea of TKU-CE is to generate the top-k HUIs by gradually updating the probabilities of itemsets with high-utility values. TKU-CE+ optimizes TKU-CE in three respects. First, unpromising items are filtered by critical utility value, to reduce the computational burden in the initial stage. Second, a sample refinement strategy is used in each iteration, to reduce the computational burden in the iterative stage. Finally, smoothing mutation is proposed, to randomly generate some new itemsets in addition to those from previous iterations. Consequently, diversity of samples is improved, so that more actual top-k HUIs can be discovered with fewer iterations. Compared with state-of-the-art algorithms, TKU-CE and TKU-CE+ are easy to implement and avoid the computational costs that would be incurred by additional data structures and threshold-raising strategies. Extensive experimental results show that both algorithms are efficient, memory-saving, scalable, and can discover the most actual top-k HUIs.

引用

页码：17026 / 17041

页数：15

共 50 条

[1] Heuristically mining the top-k high-utility itemsets with cross-entropy optimization
Song, Wei
Zheng, Chuanlong
Huang, Chaomin
Liu, Lu
[J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 17026 - 17041
[2] Mining Top-k Regular High-Utility Itemsets in Transactional Databases
Kumari, P. Lalitha
Sanjeevi, S. G.
Rao, T. V. Madhusudhana
[J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2019, 15 (01) : 58 - 79
[3] FTKHUIM: A Fast and Efficient Method for Mining Top-K High-Utility Itemsets
Vu, Vinh V.
Lam, Mi T. H.
Duong, Thuy T. M.
Manh, Ly T.
Nguyen, Thuy T. T.
Nguyen, Le V.
Yun, Unil
Snasel, Vaclav
Vo, Bay
[J]. IEEE ACCESS, 2023, 11 : 104789 - 104805
[4] Mining Top-K constrained cross-level high-utility itemsets over data streams
Meng Han
Shujuan Liu
Zhihui Gao
Dongliang Mu
Ang Li
[J]. Knowledge and Information Systems, 2024, 66 : 2885 - 2924
[5] Mining Top-K constrained cross-level high-utility itemsets over data streams
Han, Meng
Liu, Shujuan
Gao, Zhihui
Mu, Dongliang
Li, Ang
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (05) : 2885 - 2924
[6] Mining of top-k high utility itemsets with negative utility
Sun, Rui
Han, Meng
Zhang, Chunyan
Shen, Mingyao
Du, Shiyu
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (03) : 5637 - 5652
[7] Targeted mining of top-k high utility itemsets
Huang, Shan
Gan, Wensheng
Miao, Jinbao
Han, Xuming
Fournier-Viger, Philippe
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[8] TKC: Mining Top-K Cross-Level High Utility Itemsets
Nouioua, Mourad
Wang, Ying
Fournier-Viger, Philippe
Lin, Jerry Chun-Wei
Wu, Jimmy Ming-Tai
[J]. 20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 673 - 682
[9] TopHUI: Top-k high-utility itemset mining with negative utility
Gan, Wensheng
Wan, Shicheng
Chen, Jiahui
Chen, Chien-Ming
Qiu, Lina
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5350 - 5359
[10] Mining top-k high-utility itemsets from a data stream under sliding window model
Siddharth Dawar
Veronica Sharma
Vikram Goyal
[J]. Applied Intelligence, 2017, 47 : 1240 - 1255

← 1 2 3 4 5 →