Inverted Index Automata Frequent Itemset Mining for Large Dataset Frequent Itemset Mining

被引:0
|
作者
Dai, Xin [1 ]
Hamed, Haza Nuzly Abdull [1 ]
Su, Qichen [1 ]
Hao, Xue [1 ]
机构
[1] Univ Teknol Malaysia UTM, Fac Comp, Johor Baharu 81310, Johor, Malaysia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automata; Data mining; Itemsets; Memory management; Computational efficiency; Complexity theory; Real-time systems; Heuristic algorithms; Indexing; Distributed databases; Frequent itemset mining; inverted index; finite automata; depth-first search; large-scale data analysis; ASSOCIATION RULES; ALGORITHM; IMPLEMENTATION;
D O I
10.1109/ACCESS.2024.3521285
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) faces significant challenges with the expansion of large-scale datasets. Traditional algorithms such as Apriori, FP-Growth, and Eclat suffer from poor scalability and low efficiency when applied to modern datasets characterized by high dimensionality and high-density features. These algorithms demand substantial memory resources and multiple database scans, which diminishes their practicality for rapid data processing. To address these challenges, this study proposes the Inverted Index Automata Frequent Itemset Mining (IA-FIM) algorithm. IA-FIM integrates the swift retrieval of an inverted index with the robust pattern recognition of finite automata, enabling efficient processing of extensive datasets. Distinct from conventional FIM algorithms, IA-FIM utilizes an inverted index automata to efficiently reduce the search space and memory footprint, eliminating repetitive database scans and multiple tree constructions. The proposed algorithm employs a single-pass scan strategy, constructing a dynamic and adjustable inverted index for a streamlined and compact representation of data. IA-FIM demonstrates superior performance in processing large sparse dataset, enhancing the processing speed of large dataset and fulfilling the demands of the big data era. At the same time, it improves the efficiency and practicality of FIM by reducing repeated scans and large memory dependencies, making it more feasible when processing large dataset.
引用
收藏
页码:195111 / 195130
页数:20
相关论文
共 50 条
  • [41] Adaptive Apriori Algorithm for Frequent Itemset Mining
    Patill, Shubhangi D.
    Deshmukh, Ratnadeep R.
    Kirange, D. K.
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 7 - 13
  • [42] Parallel Frequent Itemset Mining on Streaming Data
    He, Yanshan
    Yue, Min
    2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 725 - 730
  • [43] Algorithms for frequent itemset mining: a literature review
    Chee, Chin-Hoong
    Jaafar, Jafreezal
    Aziz, Izzatdin Abdul
    Hasan, Mohd Hilmi
    Yeoh, William
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (04) : 2603 - 2621
  • [44] Probabilistic Frequent Itemset Mining on a GPU Cluster
    Kozawa, Yusuke
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (04): : 779 - 789
  • [45] Frequent Itemset Mining with Local Differential Privacy
    Li, Junhui
    Gan, Wensheng
    Gui, Yijie
    Wu, Yongdong
    Yu, Philip S.
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1146 - 1155
  • [46] The Discussions of Maximal Frequent Itemset Mining Optimization
    Li, Haifeng
    2016 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY (CII 2016), 2016, : 96 - 100
  • [47] An efficient algorithm for fuzzy frequent itemset mining
    Wu, Tsu-Yang
    Lin, Jerry Chun-Wei
    Yun, Unil
    Chen, Chun-Hao
    Srivastava, Gautam
    Lv, Xianbiao
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 5787 - 5797
  • [48] Frequent Itemset Mining as Set Intersection Problem
    Stanisic, Predrag
    Tomovic, Savo
    2013 2ND MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2013,
  • [49] Pushing Convertible Constraints in Frequent Itemset Mining
    Jian Pei
    Jiawei Han
    Laks V.S. Lakshmanan
    Data Mining and Knowledge Discovery, 2004, 8 : 227 - 252
  • [50] A Generalized Parallel Algorithm for Frequent Itemset Mining
    Craus, Mitica
    Archip, Alexandru
    PROCEEDINGS OF THE 12TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTERS , PTS 1-3: NEW ASPECTS OF COMPUTERS, 2008, : 520 - +