Weighted frequent itemset mining over uncertain databases

被引:46
|
作者
Lin, Jerry Chun-Wei [1 ]
Gan, Wensheng [1 ]
Fournier-Viger, Philippe [2 ]
Hong, Tzung-Pei [3 ,4 ]
Tseng, Vincent S. [5 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[2] Univ Moncton, Dept Comp Sci, Moncton, NB E1A 3E9, Canada
[3] Natl Univ Kaohsiung, Dept Comp Sci & Informat Engn, Kaohsiung, Taiwan
[4] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung 80424, Taiwan
[5] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
关键词
Data mining; Uncertain databases; Weighted frequent itemsets; Two-phase; Upper-bound; SEQUENTIAL PATTERNS; ALGORITHM;
D O I
10.1007/s10489-015-0703-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent itemset mining (FIM) is a fundamental research topic, which consists of discovering useful and meaningful relationships between items in transaction databases. However, FIM suffers from two important limitations. First, it assumes that all items have the same importance. Second, it ignores the fact that data collected in a real-life environment is often inaccurate, imprecise, or incomplete. To address these issues and mine more useful and meaningful knowledge, the problems of weighted and uncertain itemset mining have been respectively proposed, where a user may respectively assign weights to items to specify their relative importance, and specify existential probabilities to represent uncertainty in transactions. However, no work has addressed both of these issues at the same time. In this paper, we address this important research problem by designing a new type of patterns named high expected weighted itemset (HEWI) and the HEWI-Uapriori algorithm to efficiently discover HEWIs. The HEWI-Uapriori finds HEWIs using an Apriori-like two-phase approach. The algorithm introduces a property named high upper-bound expected weighted downward closure (HUBEWDC) to early prune the search space and unpromising itemsets. Substantial experiments on real-life and synthetic datasets are conducted to evaluate the performance of the proposed algorithm in terms of runtime, memory consumption, and number of patterns found. Results show that the proposed algorithm has excellent performance and scalability compared with traditional methods for weighted-itemset mining and uncertain itemset mining.
引用
收藏
页码:232 / 250
页数:19
相关论文
共 50 条
  • [31] Mining frequent subgraphs over uncertain graph databases under probabilistic semantics
    Jianzhong Li
    Zhaonian Zou
    Hong Gao
    The VLDB Journal, 2012, 21 : 753 - 777
  • [32] Frequent Weighted Itemset Mining from Gene Expression Data
    Baralis, Elena
    Cagliero, Luca
    Cerquitelli, Tania
    Chiusano, Silvia
    Garza, Paolo
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2013,
  • [33] Weighted Frequent Itemset Mining Using Weighted Subtrees: WST-WFIM
    Nalousi, Saeed
    Farhang, Yousef
    Sangar, Amin Babazadeh
    IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2021, 44 (02): : 206 - 215
  • [34] Frequent Itemset Mining from Databases Including One Evidential Attribute
    Tobji, Mohamed Anis Bach
    Ben Yaghlane, Boutheina
    Mellouli, Khaled
    SCALABLE UNCERTAINTY MANAGEMENT, SUM 2008, 2008, 5291 : 19 - +
  • [35] Privacy-Preserving Frequent Itemset Mining in Outsourced Transaction Databases
    Chandrasekharan, Iyer
    Baruah, P. K.
    Mukkamala, Ravi
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 787 - 793
  • [36] New parallel algorithms for frequent itemset mining in very large databases
    Veloso, A
    Meira, W
    Parthasarathy, S
    15TH SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2003, : 158 - 166
  • [37] Single Scan Polynomial Algorithms for Frequent Itemset Mining in Big Databases
    Djenouri, Youcef
    Djenouri, Djamel
    Lin, Jerry Chun-Wei
    Belhadi, Asma
    2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, : 1453 - 1460
  • [38] A Model of Mining Noise-tolerant Frequent Itemset in Transactional Databases
    Yu, Xiaomei
    Wang, Hong
    Zheng, Xiangwei
    Liu, Shuangshuang
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS IEEE INCOS 2015, 2015, : 21 - 24
  • [39] Mining Weighted Frequent Patterns in Incremental Databases
    Ahmed, Chowdhury Farhan
    Tanbeer, Syed Khairuzzaman
    Jeong, Byeong-Soo
    Lee, Young-Koo
    PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 933 - 938
  • [40] Efficient Probabilistic Frequent Itemset Mining in Big Sparse Uncertain Data
    Xu, Jing
    Li, Ning
    Mao, Xiao-Jiao
    Yang, Yu-Bin
    PRICAI 2014: TRENDS IN ARTIFICIAL INTELLIGENCE, 2014, 8862 : 235 - 247