Inverted Index Automata Frequent Itemset Mining for Large Dataset Frequent Itemset Mining

被引:0
|
作者
Dai, Xin [1 ]
Hamed, Haza Nuzly Abdull [1 ]
Su, Qichen [1 ]
Hao, Xue [1 ]
机构
[1] Univ Teknol Malaysia UTM, Fac Comp, Johor Baharu 81310, Johor, Malaysia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automata; Data mining; Itemsets; Memory management; Computational efficiency; Complexity theory; Real-time systems; Heuristic algorithms; Indexing; Distributed databases; Frequent itemset mining; inverted index; finite automata; depth-first search; large-scale data analysis; ASSOCIATION RULES; ALGORITHM; IMPLEMENTATION;
D O I
10.1109/ACCESS.2024.3521285
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) faces significant challenges with the expansion of large-scale datasets. Traditional algorithms such as Apriori, FP-Growth, and Eclat suffer from poor scalability and low efficiency when applied to modern datasets characterized by high dimensionality and high-density features. These algorithms demand substantial memory resources and multiple database scans, which diminishes their practicality for rapid data processing. To address these challenges, this study proposes the Inverted Index Automata Frequent Itemset Mining (IA-FIM) algorithm. IA-FIM integrates the swift retrieval of an inverted index with the robust pattern recognition of finite automata, enabling efficient processing of extensive datasets. Distinct from conventional FIM algorithms, IA-FIM utilizes an inverted index automata to efficiently reduce the search space and memory footprint, eliminating repetitive database scans and multiple tree constructions. The proposed algorithm employs a single-pass scan strategy, constructing a dynamic and adjustable inverted index for a streamlined and compact representation of data. IA-FIM demonstrates superior performance in processing large sparse dataset, enhancing the processing speed of large dataset and fulfilling the demands of the big data era. At the same time, it improves the efficiency and practicality of FIM by reducing repeated scans and large memory dependencies, making it more feasible when processing large dataset.
引用
收藏
页码:195111 / 195130
页数:20
相关论文
共 50 条
  • [31] A Review of Scalable Approaches for Frequent Itemset Mining
    Apiletti, Daniele
    Garza, Paolo
    Pulvirenti, Fabio
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS (ADBIS 2015), 2015, 539 : 243 - 247
  • [32] Parallel Analytical Model for Frequent Itemset Mining
    Poorva, K.
    Anushree, H. K.
    Mahesha, K., V
    Pavithra, T. R.
    Vinutha, D. C.
    Chandini, S. B.
    2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 517 - 519
  • [33] Frequent Itemset Mining Techniques - A Technical Review
    Chaure, Tushar M.
    Singh, Kavita R.
    2016 WORLD CONFERENCE ON FUTURISTIC TRENDS IN RESEARCH AND INNOVATION FOR SOCIAL WELFARE (STARTUP CONCLAVE), 2016,
  • [34] An Audit Environment for Outsourcing of Frequent Itemset Mining
    Wong, W. K.
    Cheung, David W.
    Hung, Edward
    Kao, Ben
    Mamoulis, Nikes
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01): : 1162 - 1172
  • [35] Private Frequent Itemset Mining in the Local Setting
    Fu, Hang
    Yang, Wei
    Huang, Liusheng
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT II, 2021, 12938 : 338 - 350
  • [36] PrivBasis: Frequent Itemset Mining with Differential Privacy
    Li, Ninghui
    Qardaji, Wahbeh
    Su, Dong
    Cao, Jianneng
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (11): : 1340 - 1351
  • [37] Pushing convertible constraints in frequent itemset mining
    Pei, J
    Han, JW
    Lakshmanan, LVS
    DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (03) : 227 - 252
  • [38] Revised ECLAT Algorithm for Frequent Itemset Mining
    Suvalka, Bharati
    Khandelwal, Sarika
    Patel, Chintal
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 2, INDIA 2016, 2016, 434 : 219 - 226
  • [39] Frequent Itemset Mining on Correlated Probabilistic Databases
    Kalaz, Yasemin Asan
    Raman, Rajeev
    DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 84 - 98
  • [40] A Highly Parallel Algorithm for Frequent Itemset Mining
    Mesa, Alejandro
    Feregrino-Uribe, Claudia
    Cumplido, Rene
    Hernandez-Palancar, Jose
    ADVANCES IN PATTERN RECOGNITION, 2010, 6256 : 291 - +