Sequential All Frequent Itemsets Detection A Method to Detect All Frequent Sequential Itemsets Using LERP-Reduced Suffix Array Data Structure and ARPaD Algorithm
被引:4
|
作者:
Xylogiannopoulos, Konstantinos F.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, CanadaUniv Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
Xylogiannopoulos, Konstantinos F.
[1
]
Karampelas, Panagiotis
论文数: 0引用数: 0
h-index: 0
机构:
Hellenic Air Force Acad, Dept Informat & Comp, Dekelia Air Base, GreeceUniv Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
Karampelas, Panagiotis
[2
]
Alhajj, Reda
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, CanadaUniv Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
Sequential frequent itemsets detection is one of the core problems in data mining. In the current paper we propose a new methodology based on our previous work regarding the detection of all repeated patterns in a string. By analyzing big datasets from FIMI website of up to one million transactions we were able to detect not only the most frequent sequential itemsets but any sequential itemset occurred at least twice in the transactions' database. For this purpose we have used a novel data structure the LERP Reduced Suffix Array and the innovative ARPaD algorithm which allows the detection of all repeated patterns in a string. The methodology uses a pre-statistical analysis of the transactions that allows constructing in a very efficient way smaller LERP-RSA data structures for each transaction. The integration and classification of all LERP-RSAs let ARPaD algorithm to be executed in parallel and to detect every sequential itemset that occurs at least twice in a very efficient way.