ExamPaper
数据科学算法代考 Is Association Rule Mining a descriptive or predictive data mining approach? Explain ARM and the meaning of the data model it provides.
1.Given a Classification problem and a dataset, where each record has several attributes and a class label, a learning algorithm can be applied to the data in order to determine a classification 数据科学算法代考
The model is thenused to classify previously unseen data (data without a class label) to predict the class label.
(a)Hunt’s algorithm is the general approach to learn a classification model in the form of a decision tree. Provide its pseudocode.What are the three main design choices in any ‘specific’ decision tree induction algorithm? Provide the definition of the GINI index for a single node and the GINI index for a binary split.(8 marks)
(b)Howdo you measure the performance of a Decision Tree? What are the generalisation error and the resubstitution error?(3 marks)
(c)A golf player keeps a record of the weather condition of days in which they went to play. Consider the set of records with four features (O, T, H, W) and a class (“play”) shown in Table Q1-1. Whatdata type (nominal, ordinal, binary) are the attributes and the class?
Compare the two decision trees shown in Figure Q1-1 by computing the two estimates of the generalization error based on the re-substitution error: 数据科学算法代考
- the optimistic estimateand
- the pessimistic estimate with penalty term of 0.9. (5 marks)
(d)What is the meaning of the penalty term in estimating the generalisation error? For which value of the penalty term in the decision tree in Figure Q1-1.a would have a smaller pessimistic estimateof the generalisation error than the one in Figure Q1-1.b?(4 marks)
ID | Outlook (O) | Temperature (T) | Humidity (H) | Windy (W) | play |
1 | Overcast | Cool | High | Yes | No |
2 | Overcast | Cool | Low | No | Yes |
3 | Overcast | Cool | Low | Yes | No |
4 | Overcast | Cool | Normal | Yes | No |
5 | Overcast | Hot | High | No | No |
6 | Overcast | Hot | Normal | No | Yes |
7 | Overcast | Mild | Low | Yes | Yes |
8 | Rainy | Cool | High | No | No |
9 | Rainy | Hot | High | No | No |
10 | Rainy | Hot | High | Yes | No |
11 | Rainy | Mild | Normal | No | Yes |
12 | Rainy | Mild | Normal | Yes | No |
13 | Sunny | Cool | Normal | No | Yes |
14 | Sunny | Cool | Normal | Yes | No |
15 | Sunny | Mild | High | No | Yes |
16 | Sunny | Mild | High | Yes | No |
Table Q1-1. Golf data
Figure Q1-1. Decision Trees
2.(a) Brieflydiscuss Cluster Analysis in general and, in particular, two types of clustering: partitional and hierarchical.(4 marks) 数据科学算法代考
(b)Compare and contrast one algorithm for partitional clustering and one for hierarchical clustering in terms of advantages (at leasttwo) and disadvantages (at least two), including their computational complexity.(6 marks)
(c)Consider the set of 6 data points in 2 dimensions (x,y) in the table in Figure Q2-1. Apply one iteration of the k-means algorithm (for k=2)to find the cluster allocation (C0 or C1) of each data point and the values of the centroids at the end of the iteration (it-1). 数据科学算法代考
Which of the two alternative initialisations of the centroids (c0 and c1 at it-0) given below produced the best clustering according to the cost function (SSE) optimised by k-means?
Provide the results in the following page as well as a worked solution (formulas and your arithmetic calculations) to compute the values of the centroids at the end of the iteration (it-1), the cluster allocations and the values of the cost function before and after the iteration (it-0 and it-1).
Figure Q2-1. The input data points
3.An Association Rule is an implication expression of the form X àY, where X and Y are disjoint itemsets.
(a)Howmany possible non-empty itemsets can be generated from a list of 12 unique items? How many non-redundant association rules can be generated from them? (5 marks) 数据科学算法代考
(b)Is Association Rule Mining a descriptive or predictive data mining approach? Explain ARM and the meaning of the data model it provides. (3marks)
(c)What are the support and the confidence of an association rule? Describe these measures and provide their formula. Given the transactionsin Table Q3-1, what are the support and the confidence of the following four rules?
- {steak} à{wine}
- {bread, eggs} àcheese}
- {cod, potatoes} à{peas}
- {peas,potatoes} à {cod} (12 marks)
TID | itemset | |||
1 | potatoes | onions | sausages | peas |
2 | cod | peas | potatoes | |
3 | bread | steak | crisps | wine |
4 | eggs | bread | oranges | |
5 | crisps | cola | sausages | beer |
6 | onions | potatoes | eggs | |
7 | cod | eggs | peas | wine |
8 | crisps | chocolate | cola | |
9 | crisps | beer | 数据科学算法代考 | |
10 | steak | wine | lettuce | cheese |
11 | cheese | eggs | bread | |
12 | onions | potatoes | cod | |
13 | chocolate | crisps | cola | beer |
14 | oranges | peas | lettuce | potatoes |
15 | bread | cheese | ||
16 | eggs | sausages | potatoes | |
17 | steak | cod | eggs | wine |
18 | crisps | chocolate | ||
19 | bread | eggs | cheese | |
20 | bread | sausages | wine |
Table Q3-1. A list of transactions
(End of Question Paper)
更多代写:Statistics统计学网课代考 雅思代考 爱尔兰管理学作业代写 加拿大ps代写 加拿大学期论文代写 被认为作弊