Abstract

In
data mining, Association rule mining 
becomes  one  of 
the  important  tasks 
of descriptive technique which can be defined as discovering meaningful
patterns from large collection of data. Mining frequent itemset is very
fundamental part of association rule mining.Many algorithms have been proposed
from last many decades including horizontal layout based techniques, vertical   layout  
based   techniques and
projected   layout   based techniques. But most of the techniques
suffer from repeated database scan, Candidate generation (Apriori Algorithms),
memory consumption problem and many more for mining frequent patterns.As in
retailer industry many transactional databases contain same set of transactions
many times,  to  apply this 
thought,  in  this 
thesis  present  an improved Apriori algorithm  that guarantee the better performance than
classical Apriori algorithm.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

 

Index terms : Hadoop, Map-Reduce, Apriori,
Support  and Confidence. 

 

1. INTRODUCTION

Data
mining is the main part of KDD. Data mining normally involves four classes of
task; classification, clustering, regression, and association rule learning.
Data mining refers to discover knowledge in enormous amounts of data. It is a
precise discipline that is concerned with analyzing observational data sets   with  
the objective of  finding  unsuspected relationships and produces a
review of the data in  novel  ways that the owner can understand and use.

Data mining as a field
of study involves the integration
of ideas from many domains rather than a pure
discipline the four main disciplines1,
which are contributing to data mining include:

•   
Statistics:  it 
can  make available  tools 
for  measuring  importance 
of  the  given 
data, estimating probabilities and many other tasks (e. g. linear
regression).

 

 

 

 

•   
Machine
learning: it provides algorithms for inducing knowledge from given data (e g.
SVM).

•   
Data
management and databases: in view of the fact that data mining deals with huge
size of data, an efficient way of accessing and maintaining data is needed.

•   
Artificial
intelligence: it contributes to tasks involving knowledge encoding or search
techniques (e. g. neural networks).

 

 

 

 

 

Figure1: Architecture of a Data mining system

It is
fundamentally important to declare that the prime key to understand and realize
the data mining technology is the ability to make different between data
mining, operations, Applications and techniques 2, as shown in Figure 2

Figure2: Block
diagram of Data mining system

2.
LITERATURE REVIEW

One of the most
well known and popular data mining techniques is the Association rules or
frequent item sets mining algorithm.

2 4 for
market basket analysis. Because of its important applicability, many revised
algorithms have been introduced since then, and Association rule mining is
still a widely researched area. Many variations done on the frequent
pattern-mining algorithm of Apriori was discussed in this article.

AIS algorithm in
4 which generates candidate item sets on-the-fly during each pass of the
database scan. Large item sets from preceding pass are checked if they were
presented in the current transaction. Therefore extending existing item sets
created new item sets. This algorithm turns out to be ineffective because it
generates too many candidate item sets. It requires more space and at the same
time this algorithm requires too many passes over the whole database and also
it generates rules with one consequent item.

2.1 Association
Rule mining

 

The techniques
for discovering association
rules from the data have conventionally
focused
on identifying relationships
between items telling me feature of human behavior,
usually trade behavior
for determining items that customers
buy together. All
rules of this type describe a particular local
pattern. The group of association
rules can be simply
interpreted
and communicated.

 

The
association rule x?yhas support s in D if the probability of a
transaction in D contains both X and Y is s.

 

The task of mining association rules is to find all the
association rules whose support is larger than a minimum support threshold and
whose confidence is larger than a minimum confidence threshold 1. These rules
are called the strong association rules.

3. Apriori Algorithm:

Apriori employs
an iterative approach known as a level-wise search , where k-itemsets are used to explore (k+1)-itemsets.

Figure3 : Flowchart of Existing System

First, the set
of frequent 1-itemsets is
found. This set is denoted L1.L1is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no
more frequent k-itemsets can be
found. The finding of each Lkrequires
one full scan of the database. In order to find all the frequent itemsets, the
algorithm adopted the recursive method. The main idea is as follows 6:

Apriori Algorithm (Itemset)

{

 

L1 = {large
1-itemsets};

               for (k=2; Lk-1??;
k++) do

{

Ck=Apriori-gen (Lk-1);

               {

                   Ct=subset (Ck,
t);

                   // get the subsets of t that are candidates

                   for each candidates c?
Ct do

c.count++;

               }

Lk={c?Ck
|c.count?minsup}

              }

                  Return=?kLk;

}

 

4.
PROPOSED SYSTEM:

This new proposed method use the large amount of item
set and reduce the number of data base scan.This approach takes less time
than  apriori algorithm.The
MAP-REDUCE(HADOOP)  Apriori algorithm
which reduce unnecessary data base scan.

Pseudo Code of
Propsoed Method

 

Algorithm
Apriori_MapReduce_Partitioning(D ,supp)

{

                // D—Input dataset

                //supp — Minimum support

 

   no_transaction = calculate_transaction(D)

   no_item = 
calculate_item(D);

for
i=1 to no_of_transaction do

                {

                                for
j=1 to no_of_items do

                                {

                                                if  Dij==1 then

                                                {

                                                                countj++;

                                                }

}

}

for
j=1 to no_of_item  do

{  

                                if(countj>
sup)

                                {

                                                 add_item (j);

                                }

}

  frequent_items=Map_Reduce(D); // calling Map
Reduce algorithm

  return frequent_items;

}

 

Algorithm
Map_Reduce(count ,D )

{

       i=1;

 while(i