association
2015-11-12 13:32:32 0 举报
AI智能生成
”Association”是一个英文词汇,通常指的是两个或更多的事物、人或概念之间的联系或关联。这种联系可以是物理的,也可以是逻辑的,或者是情感上的。例如,我们可能会说,”苹果和水果有关联,因为苹果是一种水果”;或者”阅读和学习有关联,因为阅读可以帮助我们学习新的知识”。在社交环境中,”协会”这个词通常用来指代一个由共享相同兴趣或目标的人组成的团体,他们通过共同的努力来实现这些目标。例如,”环保协会”就是一个由关心环境保护的人组成的组织,他们通过各种活动来提高人们对环保问题的认识。总的来说,”association”这个概念强调的是联系和互动,无论是在物质世界还是在社会世界中。
作者其他创作
大纲/内容
concept
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction
Itemset
A collection of one or more items
k-itemset
Support count (σ)
Frequency of occurrence of an itemset
Frequent Itemset
An itemset whose support is greater than or equal to a minsup threshold
Association Rule
An implication expression of the form
X → Y, where X and Y are itemsets
Rule Evaluation Metrics
Support (s)
Fraction of transactions that contain both X and Y
Confidence (c)
Measures how often items in Y
appear in transactions that
contain X
association rule mining task
given a set of transactions T, the goal of association rule mining is to find all rules having
support >= minsup threshold
confidence >= minconf threshold
Brute-force approach
List all possible association ruls
Compute the support and confidence for each rule
Prune rules that fail the minus and minconf
Disadvantage: Computionally prohibitive
method
Basic Approach of Association Rule
Step 1:find all frequent itemsets
Step 2:generate strong association rules from frequent itensets
Brute-force approach
Each itemset in the lattice is a candidate frequent itemset
Count the support of each candidate by scanning the database
O(NWM) => M=2^d
Apriori Algorithm
Apriori property
all non-empty subsets of a frequent itemset must also be frequent
Apriori principle holds due to the following property of the support measure
For any(X,Y) : (X include in Y ) => s(X) >= s(Y)
support of an items never exceeds the support of its subsets
notation
frequent k-itemset(denoted as Lk):satisfy mini support
candidate k-itemset(denoted as Ck):possible frequent k-itemset
Level-wise approach
(k-1)-itemsets are used to explore k-itemsets
prune Ck by subset test
generate Lk by scanning transaction DB
How to Count Supports of candidates
Candidate itemisers are stored in a hash-tree
Leaf node of hash-tree contains a list of itemisers and counts
Interior node contains a hash table
Challenges
Multiple scans of transaction database
Huge number of candidates
Tedious workload of support counting for candidats
improvement
reduce passes of transaction database scans
shrink number of candidates
facilitate support counting of candidates
DHP
Observation of performance in association rule mining
initial candidate set generation is key issue to improve
amount of transaction data that must be scanned
Major features of DHP
efficient generation for frequent itemsets
effective reduction on transaction database size
option of reducing #(database scan) required
Partitioning
Observation
Partition size is chosen to be resident in main memory
Observation: any potential frequent itemset appears as a frequent itemset in at least one of the partitions.
Transaction DB is divided into non- overlapping partitions
Twophasesscanning
Firstscan:generatesasetofallpotentially frequent itemsets
Each partition generates the local frequent itemsets
Secondscan:actualsupportismeasured
Collection of local frequent itemset = global candidate itemset
Global frequent itemsets are found by scan DB
Advantages:
Adapts to available main memory
Easily parallelized
Maximum number of database scans is two
Disadvantage
May have many candidates during second scan
Sampling
Observation
Sample the database and apply Apriori to the sample.
Potentially Large Itemsets (PL): Large itemsets from sample
Negative Border (BD - ):
Generalization of Apriori-Gen
applied to itemsets of varying sizes.
Minimal set of itemsets which are not in PL, but whose subsets are all in PL.
Advantages
Reduces number of database scans to one
in the best case and two in worst n Scales better
Disadvantages
Potentially large number of candidates in second pass
结束
要点1
要点2
要点3
0 条评论
下一页