Last updated: July 18, 2019
Topic: BusinessMining
Sample donated:


data mining, Association rule mining 
becomes  one  of 
the  important  tasks 
of descriptive technique which can be defined as discovering meaningful
patterns from large collection of data. Mining frequent itemset is very
fundamental part of association rule mining.Many algorithms have been proposed
from last many decades including horizontal layout based techniques, vertical   layout  
based   techniques and
projected   layout   based techniques. But most of the techniques
suffer from repeated database scan, Candidate generation (Apriori Algorithms),
memory consumption problem and many more for mining frequent patterns.As in
retailer industry many transactional databases contain same set of transactions
many times,  to  apply this 
thought,  in  this 
thesis  present  an improved Apriori algorithm  that guarantee the better performance than
classical Apriori algorithm.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


Index terms : Hadoop, Map-Reduce, Apriori,
Support  and Confidence. 



mining is the main part of KDD. Data mining normally involves four classes of
task; classification, clustering, regression, and association rule learning.
Data mining refers to discover knowledge in enormous amounts of data. It is a
precise discipline that is concerned with analyzing observational data sets   with  
the objective of  finding  unsuspected relationships and produces a
review of the data in  novel  ways that the owner can understand and use.

Data mining as a field
of study involves the integration
of ideas from many domains rather than a pure
discipline the four main disciplines1,
which are contributing to data mining include:

Statistics:  it 
can  make available  tools 
for  measuring  importance 
of  the  given 
data, estimating probabilities and many other tasks (e. g. linear





learning: it provides algorithms for inducing knowledge from given data (e g.

management and databases: in view of the fact that data mining deals with huge
size of data, an efficient way of accessing and maintaining data is needed.

intelligence: it contributes to tasks involving knowledge encoding or search
techniques (e. g. neural networks).






Figure1: Architecture of a Data mining system

It is
fundamentally important to declare that the prime key to understand and realize
the data mining technology is the ability to make different between data
mining, operations, Applications and techniques 2, as shown in Figure 2

Figure2: Block
diagram of Data mining system


One of the most
well known and popular data mining techniques is the Association rules or
frequent item sets mining algorithm.

2 4 for
market basket analysis. Because of its important applicability, many revised
algorithms have been introduced since then, and Association rule mining is
still a widely researched area. Many variations done on the frequent
pattern-mining algorithm of Apriori was discussed in this article.

AIS algorithm in
4 which generates candidate item sets on-the-fly during each pass of the
database scan. Large item sets from preceding pass are checked if they were
presented in the current transaction. Therefore extending existing item sets
created new item sets. This algorithm turns out to be ineffective because it
generates too many candidate item sets. It requires more space and at the same
time this algorithm requires too many passes over the whole database and also
it generates rules with one consequent item.

2.1 Association
Rule mining


The techniques
for discovering association
rules from the data have conventionally
on identifying relationships
between items telling me feature of human behavior,
usually trade behavior
for determining items that customers
buy together. All
rules of this type describe a particular local
pattern. The group of association
rules can be simply
and communicated.


association rule x?yhas support s in D if the probability of a
transaction in D contains both X and Y is s.


The task of mining association rules is to find all the
association rules whose support is larger than a minimum support threshold and
whose confidence is larger than a minimum confidence threshold 1. These rules
are called the strong association rules.

3. Apriori Algorithm:

Apriori employs
an iterative approach known as a level-wise search , where k-itemsets are used to explore (k+1)-itemsets.

Figure3 : Flowchart of Existing System

First, the set
of frequent 1-itemsets is
found. This set is denoted L1.L1is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no
more frequent k-itemsets can be
found. The finding of each Lkrequires
one full scan of the database. In order to find all the frequent itemsets, the
algorithm adopted the recursive method. The main idea is as follows 6:

Apriori Algorithm (Itemset)



L1 = {large

               for (k=2; Lk-1??;
k++) do


Ck=Apriori-gen (Lk-1);


                   Ct=subset (Ck,

                   // get the subsets of t that are candidates

                   for each candidates c?
Ct do









This new proposed method use the large amount of item
set and reduce the number of data base scan.This approach takes less time
than  apriori algorithm.The
MAP-REDUCE(HADOOP)  Apriori algorithm
which reduce unnecessary data base scan.

Pseudo Code of
Propsoed Method


Apriori_MapReduce_Partitioning(D ,supp)


                // D—Input dataset

                //supp — Minimum support


   no_transaction = calculate_transaction(D)

   no_item = 

i=1 to no_of_transaction do


j=1 to no_of_items do


                                                if  Dij==1 then






j=1 to no_of_item  do




                                                 add_item (j);



  frequent_items=Map_Reduce(D); // calling Map
Reduce algorithm

  return frequent_items;



Map_Reduce(count ,D )