Abstract

In

data mining, Association rule mining

becomes one of

the important tasks

of descriptive technique which can be defined as discovering meaningful

patterns from large collection of data. Mining frequent itemset is very

fundamental part of association rule mining.Many algorithms have been proposed

from last many decades including horizontal layout based techniques, vertical layout

based techniques and

projected layout based techniques. But most of the techniques

suffer from repeated database scan, Candidate generation (Apriori Algorithms),

memory consumption problem and many more for mining frequent patterns.As in

retailer industry many transactional databases contain same set of transactions

many times, to apply this

thought, in this

thesis present an improved Apriori algorithm that guarantee the better performance than

classical Apriori algorithm.

Index terms : Hadoop, Map-Reduce, Apriori,

Support and Confidence.

1. INTRODUCTION

Data

mining is the main part of KDD. Data mining normally involves four classes of

task; classification, clustering, regression, and association rule learning.

Data mining refers to discover knowledge in enormous amounts of data. It is a

precise discipline that is concerned with analyzing observational data sets with

the objective of finding unsuspected relationships and produces a

review of the data in novel ways that the owner can understand and use.

Data mining as a field

of study involves the integration

of ideas from many domains rather than a pure

discipline the four main disciplines1,

which are contributing to data mining include:

•

Statistics: it

can make available tools

for measuring importance

of the given

data, estimating probabilities and many other tasks (e. g. linear

regression).

•

Machine

learning: it provides algorithms for inducing knowledge from given data (e g.

SVM).

•

Data

management and databases: in view of the fact that data mining deals with huge

size of data, an efficient way of accessing and maintaining data is needed.

•

Artificial

intelligence: it contributes to tasks involving knowledge encoding or search

techniques (e. g. neural networks).

Figure1: Architecture of a Data mining system

It is

fundamentally important to declare that the prime key to understand and realize

the data mining technology is the ability to make different between data

mining, operations, Applications and techniques 2, as shown in Figure 2

Figure2: Block

diagram of Data mining system

2.

LITERATURE REVIEW

One of the most

well known and popular data mining techniques is the Association rules or

frequent item sets mining algorithm.

2 4 for

market basket analysis. Because of its important applicability, many revised

algorithms have been introduced since then, and Association rule mining is

still a widely researched area. Many variations done on the frequent

pattern-mining algorithm of Apriori was discussed in this article.

AIS algorithm in

4 which generates candidate item sets on-the-fly during each pass of the

database scan. Large item sets from preceding pass are checked if they were

presented in the current transaction. Therefore extending existing item sets

created new item sets. This algorithm turns out to be ineffective because it

generates too many candidate item sets. It requires more space and at the same

time this algorithm requires too many passes over the whole database and also

it generates rules with one consequent item.

2.1 Association

Rule mining

The techniques

for discovering association

rules from the data have conventionally

focused

on identifying relationships

between items telling me feature of human behavior,

usually trade behavior

for determining items that customers

buy together. All

rules of this type describe a particular local

pattern. The group of association

rules can be simply

interpreted

and communicated.

The

association rule x?yhas support s in D if the probability of a

transaction in D contains both X and Y is s.

The task of mining association rules is to find all the

association rules whose support is larger than a minimum support threshold and

whose confidence is larger than a minimum confidence threshold 1. These rules

are called the strong association rules.

3. Apriori Algorithm:

Apriori employs

an iterative approach known as a level-wise search , where k-itemsets are used to explore (k+1)-itemsets.

Figure3 : Flowchart of Existing System

First, the set

of frequent 1-itemsets is

found. This set is denoted L1.L1is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no

more frequent k-itemsets can be

found. The finding of each Lkrequires

one full scan of the database. In order to find all the frequent itemsets, the

algorithm adopted the recursive method. The main idea is as follows 6:

Apriori Algorithm (Itemset)

{

L1 = {large

1-itemsets};

for (k=2; Lk-1??;

k++) do

{

Ck=Apriori-gen (Lk-1);

{

Ct=subset (Ck,

t);

// get the subsets of t that are candidates

for each candidates c?

Ct do

c.count++;

}

Lk={c?Ck

|c.count?minsup}

}

Return=?kLk;

}

4.

PROPOSED SYSTEM:

This new proposed method use the large amount of item

set and reduce the number of data base scan.This approach takes less time

than apriori algorithm.The

MAP-REDUCE(HADOOP) Apriori algorithm

which reduce unnecessary data base scan.

Pseudo Code of

Propsoed Method

Algorithm

Apriori_MapReduce_Partitioning(D ,supp)

{

// D—Input dataset

//supp — Minimum support

no_transaction = calculate_transaction(D)

no_item =

calculate_item(D);

for

i=1 to no_of_transaction do

{

for

j=1 to no_of_items do

{

if Dij==1 then

{

countj++;

}

}

}

for

j=1 to no_of_item do

{

if(countj>

sup)

{

add_item (j);

}

}

frequent_items=Map_Reduce(D); // calling Map

Reduce algorithm

return frequent_items;

}

Algorithm

Map_Reduce(count ,D )

{

i=1;

while(i