Inverted Index Automata Frequent Itemset Mining for Large Dataset Frequent Itemset Mining

被引:0
|
作者
Dai, Xin [1 ]
Hamed, Haza Nuzly Abdull [1 ]
Su, Qichen [1 ]
Hao, Xue [1 ]
机构
[1] Univ Teknol Malaysia UTM, Fac Comp, Johor Baharu 81310, Johor, Malaysia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automata; Data mining; Itemsets; Memory management; Computational efficiency; Complexity theory; Real-time systems; Heuristic algorithms; Indexing; Distributed databases; Frequent itemset mining; inverted index; finite automata; depth-first search; large-scale data analysis; ASSOCIATION RULES; ALGORITHM; IMPLEMENTATION;
D O I
10.1109/ACCESS.2024.3521285
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) faces significant challenges with the expansion of large-scale datasets. Traditional algorithms such as Apriori, FP-Growth, and Eclat suffer from poor scalability and low efficiency when applied to modern datasets characterized by high dimensionality and high-density features. These algorithms demand substantial memory resources and multiple database scans, which diminishes their practicality for rapid data processing. To address these challenges, this study proposes the Inverted Index Automata Frequent Itemset Mining (IA-FIM) algorithm. IA-FIM integrates the swift retrieval of an inverted index with the robust pattern recognition of finite automata, enabling efficient processing of extensive datasets. Distinct from conventional FIM algorithms, IA-FIM utilizes an inverted index automata to efficiently reduce the search space and memory footprint, eliminating repetitive database scans and multiple tree constructions. The proposed algorithm employs a single-pass scan strategy, constructing a dynamic and adjustable inverted index for a streamlined and compact representation of data. IA-FIM demonstrates superior performance in processing large sparse dataset, enhancing the processing speed of large dataset and fulfilling the demands of the big data era. At the same time, it improves the efficiency and practicality of FIM by reducing repeated scans and large memory dependencies, making it more feasible when processing large dataset.
引用
收藏
页码:195111 / 195130
页数:20
相关论文
共 50 条
  • [1] Frequent itemset mining using cellular learning automata
    Sohrabi, Mohammad Karim
    Roshani, Reza
    COMPUTERS IN HUMAN BEHAVIOR, 2017, 68 : 244 - 253
  • [2] Frequent Itemset Mining on Hadoop
    Ferenc Kovacs
    Illes, Janos
    IEEE 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL CYBERNETICS (ICCC 2013), 2013, : 241 - 245
  • [3] On A Visual Frequent Itemset Mining
    Lim, SeungJin
    2009 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2009, : 25 - 30
  • [4] Index support for frequent itemset mining in a relational DBMS
    Baralis, E
    Cerquitelli, T
    Chiusano, S
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 754 - 765
  • [5] Parallel Incremental Frequent Itemset Mining for Large Data
    Song, Yu-Geng
    Cui, Hui-Min
    Feng, Xiao-Bing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (02) : 368 - 385
  • [6] A Novel Parallel Algorithm for Frequent Itemset Mining of Incremental Dataset
    Xu, Lijun
    Zhang, Yun
    2015 2ND INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING ICISCE 2015, 2015, : 41 - 44
  • [7] A Survey Paper on Frequent Itemset Mining
    Sastry, J. S. V. R. S.
    Suresh, V
    INTERNATIONAL CONFERENCE ON COMPUTER VISION AND MACHINE LEARNING, 2019, 1228
  • [8] Frequent Itemset Mining in Multirelational Databases
    Jimenez, Aida
    Berzal, Fernando
    Cubero, Juan-Carlos
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 15 - 24
  • [9] Parallel Incremental Frequent Itemset Mining for Large Data
    Yu-Geng Song
    Hui-Min Cui
    Xiao-Bing Feng
    Journal of Computer Science and Technology, 2017, 32 : 368 - 385
  • [10] Verified Programs for Frequent Itemset Mining
    Loulergue, Frederic
    Whitney, Christopher D.
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1516 - 1523