Hau_ITmark

New Member
Download Luận văn Mining association rules with adjustable interestingness

Download miễn phí Luận văn Mining association rules with adjustable interestingness





T ABLE OF CONTENTS
Acknowledgements .i
Abstract . ii
Table of contents . iii
List of tables and figures .iv
CHAPTER 1: Introduction . 1
1.1. What is data mining? . 1
1.2. Data mining versus query tools . 2
1.3. Mining association rules. 3
1.4. Outline of the thesis. 5
CHAPTER 2: Mining association rules with weighted items . 6
2.1. Introduction . 6
2.2. Problem definition . 7
CHAPTER 3: Mining association rules with adjustable interestingness .10
3.1. Interestingness and interesting itemsets .10
3.2. Interestingness constraints.11
3.3. Motivation behind interesting itemsets and adjustable interestingness .12
CHAPTER 4: Algorithm for mining association rules with adjustable
interestingness (MARAI) .14
4.1. Motivation .14
4.2. Preliminaries .15
4.3. Basic properties of itemset-tidset pairs .18
4.4. MARAI: Algorithm design and implementation .20
4.5. Experimental Evaluation .25
CHAPTER 5: Conclusion.28
References . a
Appendix . b



Để tải bản DOC Đầy Đủ xin Trả lời bài viết này, Mods sẽ gửi Link download cho bạn sớm nhất qua hòm tin nhắn.
Ai cần download tài liệu gì mà không tìm thấy ở đây, thì đăng yêu cầu down tại đây nhé:
Nhận download tài liệu miễn phí

Tóm tắt nội dung:


40 B, E, F
Frequent
pattern
Support
{A} 75%
{B} 50%
{C} 50%
{A, C} 50%
Figure 1. Example database and frequent itemsets
For rule CA ⇒ :
support = support({A} ∪ {C}) = 50%
confidence = support({A} ∪ {C}) / support({A}) = 66%
The problem of discovering all association rules can be decomposed into two sub-
problems [1]:
1. Find all acts of items (itemsets) that have transaction support above minimum
support. The support for an item is the number of transactions that contain the item-
set. Recall that an itemset is frequent or large if its support is more than a user-
specified minimum support (min_sup) value.
Min. support 50%
Min. confidence 50%
5
Example 1.2. From the above database, we obtain four frequent itemsets {A}, {B},
{C} and {A, C} with supports of 75%, 50%, 50% and 50% respectively.
2. Use the large itemsets to generate the desired rules. Here is a straightforward al-
gorithm for this task. For every large itemset l , find all non-empty subsets of l . For
every such subset a , output a rule of the form )( ala −⇒ if the ratio of support( l )
to support(a ) is at least minconf. We need to consider subsets of l to generate rules
with multiple consequents.
Example 1.3. From the frequent itemset {A, C} found in example 1.2, we can gen-
erate two rules whose confidences are greater than or equal to minconf value.
Itemset {A, C}
 

%100
%50
%50
})support({C
{C})}suuport({A
confidence
%66
%75
%50
})support({A
{C})}suuport({A
confidence
AC rule
CA rule
==

=
==

=


As the problem of generating rules from the itemsets in step 2 is straightforward
[1], we will not mention it over again in this thesis.
1.4. Outline of the thesis
The remainder of this thesis is as follows. In chapter 2, we state the definition of
mining association rules with weighted items. The main aim of this chapter is to
provide a background for weight based problems we base our approach on. In
chapter 3, we describe the main idea of the thesis. A new term, adjustable interest-
ingness, is also introduced here. After the extensive discussion of mining associa-
tion rules with adjustable interestingness in chapter 3, we devote chapter 4 to the
algorithm for it. Experiments on real databases are also described. Finally, we con-
clude the thesis with a summary and a discussion of future work.
6
CHAPTER 2
MINING ASSOCIATION RULES WITH
WEIGHTED ITEMS
In the last section, we discussed about mining association rule for unweighted case.
In the following, we introduce the conceptual framework of weight and apply it to
mining association rules. The concept of weight will be used in the coming chap-
ters.
2.1. Introduction
There have been two approaches to the association rule mining problem. The first
one is to mine the frequent itemsets regardless of their coefficients [1, 7]. The sec-
ond trend is to assign weights to the items to reflect their importance to the users.
Some previous works focused on mining frequent itemsets with weighted items [5]
and different supports [6].
The association rules, mentioned in previous chapter, are called the ‘unweighted’
association rules [6] as the items are treated uniformly.
Example 2.1. The following rule is the unweighted binary association rule from [1]:
(Bread = Yes) => (Ham = Yes)
with support = 60% & confidence = 80%
The above rule states that the probability of buying bread and ham in a set of trans-
action is 0.6, and the confidence states that probability that buying ham, given that
that customer buys bread, is 0.8.
7
The above rule is an unweighted case. However, it is better for the following cases
to consider the importance of the items or attributes.
For example, the rule
(Income = High) => (School level = High)
is, in human interpretation, probably more interesting than
(Price = High) => (Sales = Low)
even if the support of the latter rule is much more than that of the former.
By using the weights, the importance of the attributes or items can be reflected, and
we can mine the rules with interestingness. For example, we can add the weights to
the sales transactions, where the items are under promotion, or with more profits.
The unweighted association rules would be the same if the database did not change,
thus it cannot provide a flexible way for the users to adjust the priority of the rules.
Therefore, the mining association rules for weighted items was presented in [6] to
resolve this problem.
2.2. Problem definition
Similar to section 1.3, we consider a database with a set of transaction D , a set of
attributes or items I , and each transaction is assigned a transaction identifier TID .
Based on the definitions in section 1.3, the weights and weighted association rules
are defined [6]:
Definition 1. An item weight, w , where 10 ≤≤ w , defines the importance of the
item. 0 indicates the least important item, and 1 denotes the most important item.
For example, if the weight of the itemset X is 0.95, it tells us the itemset is impor-
tant in the set of transaction D . The weight of 0.1 indicates a less important set.
8
Definition 2. A weighted association rule (or association rule with weighted item)
has the form YX ⇒ , where IX ⊆ , IY ⊆ , φ=∩ YX , and the items in X and Y
are given by the weights.
Definition 3. The weighted support of the binary weighted rule YX ⇒ is the ad-
justing ratio of the support, or mathematically,

∪∈
=
)(
),()(),(
YXj
j YXsupportwYXwsupport
where the weights of the items },...,,{ 21 miii are },...,,{ 21 mwww respectively.
In order to find the interesting rules, two thresholds, minimum weighted support
(wminsup) and minimum confidence (minconf) must be specified.
Definition 4. An itemset X is called a large weighted itemset if the weighted sup-
port of the itemset X is greater than or equal to the weighted support threshold, or
mathematically,
wminsupXwsupport ≥)(
Definition 5. A weighted association rules YX ⇒ is called an interesting rule if
the confidence of itemset ( YX ∪ ) is greater than or equal to a minimum confi-
dence threshold, and ( YX ∪ ) is a large weighted itemset.
Product ID Item Average Profit Weight …
1 Eraser 100 0.1 …
2 Ball-pen 200 0.2 …
3 Notebook 300 0.3 …
4 Pencil 500 0.5 …
5 Pen 1000 1 …
Table 1. Database of a stationery store
9
TID Product ID TID Product ID
1 1 4 2 1 4 5
3 2 3 5 4 3 5
5 1 2 4 5 6 1 3 4
7 3 8 2 5
Table 2. Transactions of a stationery store
Example 2.2. Suppose in a stationery store, a database is shown in Table 1. Each
item includes information of name, profit and given weight. Table 2 gives the
transaction database. For each transaction, there will be a transaction identifier
(TID ) and the names of items. Suppose there are only 5 items and totally 8 transac-
tions in the transaction database.
Regardless of the weights given, if the value of minsup is set to 0.4, {1, 4} will be a
large itemset since its support is 50%. However, {1, 4, 5} is not a large itemset as it
appears only two times in the database.
But if we take weights of items into account, and the given value of wminsup is 0.4,
{1, 4} will not be a large weighted itemset since
(0.1 + 0.5) x
8
4
= 0.3 ≤ 0.4
On the contrary, {1, 4, 5} will be a large itemset since
(0.1 + 0.5 + 1) x
8
2
= 0.4 ≥ 0.4
By the same argument, {5}, {1, 2, 5} will be large weighted itemsets.
Although itemset {1, 4} has a greater support than that of {1, 2, 5}, it seem to be
true that the latter otherwise make a greater profit than the former can do. In this
case, we say that itemset {1, 2, 5} is more interesting than itemset {1, 4}, or the in-
terestingness of itemset {1, 2, 5} is greater than that of itemset {1, 4}.
10
CHAPTER 3
MINING ASSOCIATION RULES WITH ADJUST-
ABLE INTERESTINGNESS
In this chapter, we design a new concept, adjustable interestingness. Furthermore, a
novel approach in mining association rules based on adjustable interestingness is
introduced.
3.1. Interestingness and interesting itemsets
Based on the definitions of weighted itemsets in previous chapter, we extend the
definitions of interestingness and interesting itemsets.
Definition 1. The interestingness of an itemset X , denoted interest( X ), is the co-
efficient correlation between the number of transactions in which it occurs as a sub-
set and the total weight of its items, or methametically,


=
)
)()()(
Xj
j XsupportwXinterest
In order to find the interesting itemsets, the threshold, minimum interestingness
(min_int) must be specified.
Definition 2. An itemset X is called an interesting itemset if the interestingness...
 

Các chủ đề có liên quan khác

Top