class_07cx2

New Member
Download Luận văn Para-llel Mining of Fuzzy Association Rules

Download miễn phí Luận văn Para-llel Mining of Fuzzy Association Rules





Table of Contents
Abstract . 1
Acknowledgements . 3
Table of Contents . 4
List of Figures . 6
List of Tables . 7
Notations & Abbreviations . 8
Chapter 1. Introduction to Data Mining. 9
1.1 Data Mining. 9
1.1.1 Motivation: Why Data Mining? . 9
1.1.2 Definition: What is Data Mining? . 10
1.1.3 Main Steps in Knowledge Discovery in Databases (KDD) . 11
1.1 Major Approaches and Techniques in Data Mining . 12
1.2.1 Major Approaches and Techniques in Data Mining. 12
1.2.2 Kinds of data could be mined? . 13
1.2 Application ofData Mining . 14
1.2.1 Application of DataMining. 14
1.2.2 Classification of Data Mining Systems . 14
1.3 Focused issues in Data Mining. 15
Chapter 2. Association Rules . 17
2.1 Motivation: Why Association Rules? . 17
2.2 Association Rules Mining– Problem Statement . 18
2.3 Main Research Trends in Association Rules Mining. 20
Chapter 3. Fuzzy Association Rules Mining . 23
3.1 Quantitative Association Rules . 23
3.1.1 Association Rules with Quantitative and Categorical Attributes . 23
3.1.2 Methods of DataDiscretization. 24
3.2 Fuzzy Association Rules . 27
3.2.1 Data Discretization based on Fuzzy Set . 27
3.2.2 Fuzzy Association Rules . 29
3.2.3 Algorithm for Fuzzy Association Rules Mining . 34
3.2.4 Relation between Fuzzy Association Rules and Quantitative Association Rules . 39
3.2.5 Experiments and Conclusions . 39
Chapter 4. Parallel Mining ofFuzzy Association Rules. 41
4.1 Several Previously Proposed Parallel Algorithms . 42
4.1.1 Count Distribution Algorithm . 42
4.1.2 Data Distribution Algorithm. 43
4.1.3 Candidate Distribution Algorithm . 45
4.1.4 Algorithm for Parallel Generation of Association Rules . 48
4.1.5 Other Parallel Algorithms. 50
4.2 A New Parallel Algorithm for Fuzzy Association Rules Mining . 50
4.2.1 Our Approach . 51
4.2.2 The New Algorithm. 55
4.3 Experiments and Conclusions . 55
Chapter 5. Conclusions . 56
5.1 Achievements throughout the dissertation . 56
5.2 Future research . 57
Reference .



Để tải bản DOC Đầy Đủ xin Trả lời bài viết này, Mods sẽ gửi Link download cho bạn sớm nhất qua hòm tin nhắn.
Ai cần download tài liệu gì mà không tìm thấy ở đây, thì đăng yêu cầu down tại đây nhé:
Nhận download tài liệu miễn phí

Tóm tắt nội dung:

t cases will set the value of A_Vi to False (No or 0). The attributes
Chest pain type and resting electrocardiographics in table 4 belong to this
case. After transforming, the initial attribute Chest pain type will be converted
into four binary columns Chest_pain_type_1, Chest_pain_type_2, Chest_pain-
_type_3, Chest_pain_type_4 as shown in the following table.
25
Chest pain type
(1, 2, 3, 4)
Î
Chest_pain_
type_one_1
Chest_pain_
type_one_2
Chest_pain_
type_one_3
Chest_pain_
type_one_4
4 0 0 0 1
1 1 0 0 0
3 0 0 1 0
2
sau khi
rời rạc
hóa 0 1 0 0
Table 5 - Data discretization for categorical or quantitative attributes having finite values
• If A is a continuous and quantitative attribute or a categorical one having value
domain {v1, v2, …, vp} (p is relatively large). A will be mapped to q new
binary columns in the form of , , …, startq..endq>. Value of a given record at column is True (Yes
or 1) if the original value v at this record of A is between starti and endi, starti..endi> will receive False (No or 0) value for all other cases of v. The
attributes Age, Serum cholesterol, and Maximum heart rate in table 4 belong to
this form. Serum cholesterol and Age could be discretized as shown in the two
following tables:
Serum
cholesterol
Î
150..249>
250..349>
350..449>
450..549>
544 0 0 0 1
206 1 0 0 0
286 0 1 0 0
322
0 1 0 0
Table 6 - Data discretization for "Serum cholesterol" attribute
Age Î
74 0 0 1
29 1 0 0
30 0 1 0
59 0 1 0
60
0 0 1
Table 7 - Data discretization for "Age" attribute
Unfortunately, the mentioned discretization methods encounter some pitfalls such
as “sharp boundary problem” [4] [9]. The figure below displays the support
distribution of an attribute A having a value range from 1 to 10. Supposing that we
divide A into two separated intervals [1..5] and [6..10] respectively. If the minsup
value is 41%, the range [6..10] will not gain sufficient support. Therefore [6..10]
can not satisfy minsup (40% < minsup = 41%) even though there is a large support
near its left boundary. For example, [4..7] has support 55%, [5..8] has support
26
45%. So, this partition results in a “sharp boundary” between 5 and 6, and
therefore mining algorithms cannot generate confident rules involving interval
[6..10].
Figure 4 - "Sharp boundary problem" in data discretization
Another attribute partitioning method [38] is to divide the attribute domain into
overlapped regions. We can see that the boundaries of intervals are overlapped
with each other. As a result, the elements located near the boundary will contribute
to more than one interval such that some intervals may become interesting in this
case. It is, however, not reasonable because total support of all intervals exceeds
100% and we unintentionally overemphasize the importance of values located near
boundaries. This is not natural and inconsistent.
Furthermore, partitioning attribute domain into separated ranges results in a
problem in rule interpretation. The table 7 shows that two values 29 and 30 belong
to different intervals though they are very similar in indicating old level. Also,
supposing that the range [1..29] denotes young people, [30..59] for middle-aged
people, and [60..120] for old ones, so the age of 59 implies a middle-aged person
whereas the age of 60 implies an old person. This is not intuitive and natural in
understanding the meaning of quantitative association rules.
Fuzzy association rule was recommended to overcome the above shortcomings [4]
[9]. This kind of rule not only successfully improves “sharp boundary problem”
but also allows us to express association rules in a more intuitive and friendly
format.
27
For example, the quantitative rule “ AND AND
=> ” is now replaced by
“ AND AND => < Heart
disease: Yes>”. Age_Old and Cholesterol_High in the above rule are fuzzy
attributes.
3.2 Fuzzy Association Rules
3.2.1 Data Discretization based on Fuzzy Set
In the fuzzy set theory [21] [47], an element can belongs to a set with a
membership value in [0, 1]. This value is assigned by the membership function
associated with each fuzzy set. For attribute x and its domain Dx (also known as
universal set), the mapping of the membership function xfm associated with
fuzzy set fx is as follow:
[ ]1,0:)( →xxf Dxm (3.1)
The fuzzy set provides a smooth change over the boundaries and allows us to
express association rules in a more expressive form. Let’s use the fuzzy set in data
discretizing to make the most of its benefits.
For the attribute Age and its universal domain [0, 120], we attach with it three
fuzzy sets Age_Young, Age_Middle-aged, and Age_Old. The graphic representa-
tions of these fuzzy sets are shown in the following figures.
Figure 5 - Membership functions of "Age_Young", "Age_Middle-aged", and "Age_Old"
By using fuzzy set, we completely get rid of “sharp boundary problem” thanks to
its own characteristics. For example, the graph in figure 5 indicates that the ages of
59 and 60 have membership values of fuzzy set Age_Old approximately 0.85 and
28
0.90 respectively. Similarly, the ages of 30 and 29 towards the fuzzy set
Age_Young are 0.70 and 0.75. Obviously, this transformation method is much
more intuitive and natural than known discretization ones.
Another example, the original attribute Serum cholesterol is decomposed into two
new fuzzy attributes Cholestero_Low and Cholestero_High. The following figure
portrays membership functions of these fuzzy concepts.
Figure 6 - Membership functions of "Cholesterol_Low" and "Cholesterol_High"
If A is a categorical attribute having value domain {v1, v2, …, vk} and k is
relatively small, we fuzzify this attribute by attaching a new fuzzy attribute A_Vi
to each value vi. The value of membership function mA_Vi(x) equals to 1 if x = vi
and equals to 0 for vice versa. Ultimately thinking, A_Vi is also a normal set
because its membership function value is either 0 or 1. If k is too large, we can
fuzzify this attribute by dividing its domain into intervals and attaching a new
fuzzy attribute to each partition. However, developers or users should consult
experts for necessary knowledge related to current data to achieve an appropriate
division.
Data discretization using fuzzy sets could bring us the following benefits:
• Firstly, smooth transition of membership functions should Giúp us eliminate the
“sharp boundary problem”.
• Data discretization by using fuzzy sets assists us significantly reduce number
of new attributes because number of fuzzy sets associated with each original
attribute is relatively small comparing to that of an attribute in quantitative
association rules. For instance, if we use normal discretization methods over
attribute Serum cholesterol, we will obtain five sub-ranges (also five new
29
attributes) from its original domain [100, 600], whereas we will create only
two new attributes Cholesterol_Low and Cholesterol_High by applying fuzzy
sets. This advantage is very essential because it allows us to compact the set of
candidate itemsets, and therefore shortening the total mining time.
• Fuzzy association rule is more intuitive, and natural than known ones.
• All values of records at new attributes after fuzzifying are in [0, 1]. This is to
imply the possibility that a given element belongs to a fuzzy set. As a result,
this flexible coding brings us an exact method to measure the contribution or
impact of each record to overall support of an itemset.
• The next advantage that we will see more clearly in the next section is
fuzzified databases still hold “downward closure property” (all subsets of a
frequent itemset are also frequent, and any superset of a non-frequent itemset
will be not frequent) if we have a wise choice for T-norm operator. Thus,
conventional algorithms such as Apriori also work well upon fuzzified
databases with just slight modifications.
• Another benefit is this data discretization method can be easily applied to both
relational and transactional databases.
3.2.2 Fuzzy Association Rules
Age
Serum cholesterol
(mg/ml)
Fasting blood sugar
(>120mg/ml)
Heart disease
60 206 0 (<120mg/ml) 2 (yes)
54 239 0 2
54 286 0 2
52 255 0 2
68 274 1 (>120mg/ml) 2
54 288 1 1 (no)
46 204 0 1
37 250 0 1
71 320 0 1
74 269 0 1
29 204 0 1
70 322 0 2
67 544 0 1
Table 8 - Diagnostic database of heart disease on 13 patients
Let I = {i1, i2, …, in} be a set of n attributes, denoting iu is...
 
Top