Weighted frequent itemset mining over uncertain databases

被引:46
|
作者
Lin, Jerry Chun-Wei [1 ]
Gan, Wensheng [1 ]
Fournier-Viger, Philippe [2 ]
Hong, Tzung-Pei [3 ,4 ]
Tseng, Vincent S. [5 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[2] Univ Moncton, Dept Comp Sci, Moncton, NB E1A 3E9, Canada
[3] Natl Univ Kaohsiung, Dept Comp Sci & Informat Engn, Kaohsiung, Taiwan
[4] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung 80424, Taiwan
[5] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
关键词
Data mining; Uncertain databases; Weighted frequent itemsets; Two-phase; Upper-bound; SEQUENTIAL PATTERNS; ALGORITHM;
D O I
10.1007/s10489-015-0703-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent itemset mining (FIM) is a fundamental research topic, which consists of discovering useful and meaningful relationships between items in transaction databases. However, FIM suffers from two important limitations. First, it assumes that all items have the same importance. Second, it ignores the fact that data collected in a real-life environment is often inaccurate, imprecise, or incomplete. To address these issues and mine more useful and meaningful knowledge, the problems of weighted and uncertain itemset mining have been respectively proposed, where a user may respectively assign weights to items to specify their relative importance, and specify existential probabilities to represent uncertainty in transactions. However, no work has addressed both of these issues at the same time. In this paper, we address this important research problem by designing a new type of patterns named high expected weighted itemset (HEWI) and the HEWI-Uapriori algorithm to efficiently discover HEWIs. The HEWI-Uapriori finds HEWIs using an Apriori-like two-phase approach. The algorithm introduces a property named high upper-bound expected weighted downward closure (HUBEWDC) to early prune the search space and unpromising itemsets. Substantial experiments on real-life and synthetic datasets are conducted to evaluate the performance of the proposed algorithm in terms of runtime, memory consumption, and number of patterns found. Results show that the proposed algorithm has excellent performance and scalability compared with traditional methods for weighted-itemset mining and uncertain itemset mining.
引用
收藏
页码:232 / 250
页数:19
相关论文
共 50 条
  • [1] Weighted frequent itemset mining over uncertain databases
    Jerry Chun-Wei Lin
    Wensheng Gan
    Philippe Fournier-Viger
    Tzung-Pei Hong
    Vincent S. Tseng
    Applied Intelligence, 2016, 44 : 232 - 250
  • [2] Efficient weighted probabilistic frequent itemset mining in uncertain databases
    Li, Zhiyang
    Chen, Fengjuan
    Wu, Junfeng
    Liu, Zhaobin
    Liu, Weijiang
    EXPERT SYSTEMS, 2021, 38 (05)
  • [3] Probabilistic Frequent Itemset Mining in Uncertain Databases
    Bernecker, Thomas
    Kriegel, Hans-Peter
    Renz, Matthias
    Verhein, Florian
    Zuefle, Andreas
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 119 - 127
  • [4] Probabilistic Frequent Itemset Mining Algorithm over Uncertain Databases with Sampling
    Li, Hai-Feng
    Zhang, Ning
    Zhang, Yue-Jin
    Wang, Yue
    FUZZY SYSTEMS AND DATA MINING II, 2016, 293 : 159 - 166
  • [5] Probabilistic maximal frequent itemset mining methods over uncertain databases
    Li, Haifeng
    Hai, Mo
    Zhang, Ning
    Zhu, Jianming
    Wang, Yue
    Cao, Huaihu
    INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1219 - 1241
  • [6] Frequent Itemset Mining for a Combination of Certain and Uncertain Databases
    Wazir, Samar
    Ahmad, Tanvir
    Beg, M. M. Sufyan
    RECENT DEVELOPMENTS AND THE NEW DIRECTION IN SOFT-COMPUTING FOUNDATIONS AND APPLICATIONS, 2018, 361 : 25 - 39
  • [7] Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases
    Bernecker, Thomas
    Kriegel, Hans-Peter
    Renz, Matthias
    Verhein, Florian
    Zuefle, Andreas
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2012, 2012, 7338 : 38 - 55
  • [8] Mining weighted frequent sequences in uncertain databases
    Rahman, Md Mahmudur
    Ahmed, Chowdhury Farhan
    Leung, Carson Kai-Sang
    INFORMATION SCIENCES, 2019, 479 : 76 - 100
  • [9] HEWIN: HIGH EXPECTED WEIGHTED ITEMSET MINING IN UNCERTAIN DATABASES
    Lin, Jerry Chun-Wei
    Gan, Wensheng
    Hong, Tzung-Pei
    Tseng, Vincent S.
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 439 - 444
  • [10] Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases
    Li, Haifeng
    Wang, Yue
    Zhang, Ning
    Zhang, Yuejin
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I, 2017, 10191 : 476 - 486