Department of Computer Science Faculty Scholarship and Creative Works

Iterative Sampling Based Frequent Itemset Mining for Big Data

Xian Wu, Shanghai Jiao Tong University
Wei Fan, TencentFollow
Jing Peng, Montclair State UniversityFollow
Kun Zhang, Xavier University of LouisianaFollow
Yong Yu, Shanghai Jiao Tong University

Document Type

Article

Publication Date

12-1-2015

Abstract

Frequent pattern mining attracts extensive research interests over the past two decades: including mining frequent item sets from transactions, extracting frequent sequences from bio-arrays and detecting common subgraph from molecular structures. In the era of big data, the explosive data volume brings new challenges to frequent pattern mining: (1) Space complexity: both input data, intermediate results and the outputted patterns could be too large to fit into memory which prevents many algorithms from executing; (2) Time complexity: many existing approaches rely on exhaustive search or complicated data structures to mine frequent patterns which prove to be inapplicable for big data. To deal with these two challenges. we propose ISbFIM, an Iterative Sampling based Frequent Itemset Mining method. Rather than process the entire data set at once, ISbFIM samples computationally-manageable subsets and extracts frequent itemsets from these subsets. By repeating this process for a sufficient number of times, we can guarantee both theoretically and empirically that the frequent itemsets can be enumerated without running into a combinatorial explosion. ISbFIM can be easily parallelized and applied to mine item sets, sequences or structures. We implement a Map-Reduce version of ISbFIM to demonstrate its scalability on big data.

DOI

10.1007/s13042-015-0345-6

Montclair State University Digital Commons Citation

Wu, Xian; Fan, Wei; Peng, Jing; Zhang, Kun; and Yu, Yong, "Iterative Sampling Based Frequent Itemset Mining for Big Data" (2015). Department of Computer Science Faculty Scholarship and Creative Works. 357.
https://digitalcommons.montclair.edu/compusci-facpubs/357

This document is currently not available here.

COinS

Department of Computer Science Faculty Scholarship and Creative Works

Iterative Sampling Based Frequent Itemset Mining for Big Data

Document Type

Publication Date

Abstract

DOI

Montclair State University Digital Commons Citation

Search

Browse

Author Corner

Links

Department of Computer Science Faculty Scholarship and Creative Works

Iterative Sampling Based Frequent Itemset Mining for Big Data

Authors

Document Type

Publication Date

Abstract

DOI

Montclair State University Digital Commons Citation

Share

Search

Browse

Author Corner

Links

//<![CDATA[ document.write("<a href='mailto:" + "digitalcommons" + "@" + "mail.montclair.edu" + "'>" + "Contact Us" + "<\/a>") //]]>