当前位置:天才代写 > 算法代写 > 数据科学算法代考 data science algorithms代写

数据科学算法代考 data science algorithms代写

2022-03-13 09:18 星期日 所属: 算法代写 浏览:165

数据科学算法代考

ExamPaper

数据科学算法代考 Is Association Rule Mining a descriptive or predictive data mining approach? Explain ARM and the meaning of the data model it provides.

1.Given a Classification problem and a dataset, where each record has several attributes and a class label, a learning algorithm can be applied to the data in order to determine a classification   数据科学算法代考

The model is thenused to classify previously unseen data (data without a class label) to predict the class label.

(a)Hunt’s algorithm is the general approach to learn a classification model in the form of a decision tree. Provide its pseudocode.What are the three main design choices in any ‘specific’ decision tree induction algorithm? Provide the definition of the GINI index for a single node and the GINI index for a binary split.(8 marks)

(b)Howdo you measure the performance of a Decision Tree? What are the generalisation error and the resubstitution error?(3 marks)

(c)A golf player keeps a record of the weather condition of days in which they went to play. Consider the set of records with four features (O, T, H, W) and a class (“play”) shown in Table Q1-1. Whatdata type (nominal, ordinal, binary) are the attributes and the class?

Compare the two decision trees shown in Figure Q1-1 by computing the two estimates of the generalization error based on the re-substitution error:  数据科学算法代考

  • the optimistic estimateand
  • the pessimistic estimate with penalty term of 0.9. (5 marks)

(d)What is the meaning of the penalty term in estimating the generalisation error? For which value of the penalty term in the decision tree in Figure Q1-1.a would have a smaller pessimistic estimateof the generalisation error than the one in Figure Q1-1.b?(4 marks)

ID Outlook (O) Temperature (T) Humidity (H) Windy (W) play
1 Overcast Cool High Yes No
2 Overcast Cool Low No Yes
3 Overcast Cool Low Yes No
4 Overcast Cool Normal Yes No
5 Overcast Hot High No No
6 Overcast Hot Normal No Yes
7 Overcast Mild Low Yes Yes
8 Rainy Cool High No No
9 Rainy Hot High No No
10 Rainy Hot High Yes No
11 Rainy Mild Normal No Yes
12 Rainy Mild Normal Yes No
13 Sunny Cool Normal No Yes
14 Sunny Cool Normal Yes No
15 Sunny Mild High No Yes
16 Sunny Mild High Yes No

Table Q1-1. Golf data

Figure Q1-1. Decision Trees

数据科学算法代考
数据科学算法代考

2.(a) Brieflydiscuss Cluster Analysis in general and, in particular, two types of clustering: partitional and hierarchical.(4 marks)  数据科学算法代考

(b)Compare and contrast one algorithm for partitional clustering and one for hierarchical clustering in terms of advantages (at leasttwo) and disadvantages (at least two), including their computational complexity.(6 marks)

(c)Consider the set of 6 data points in 2 dimensions (x,y) in the table in Figure Q2-1. Apply one iteration of the k-means algorithm (for k=2)to find the cluster allocation (C0 or C1) of each data point and the values of the centroids at the end of the iteration (it-1).  数据科学算法代考

Which of the two alternative initialisations of the centroids (c0 and c1 at it-0) given below produced the best clustering according to the cost function (SSE) optimised by k-means?

Provide the results in the following page as well as a worked solution (formulas and your arithmetic calculations) to compute the values of the centroids at the end of the iteration (it-1), the cluster allocations and the values of the cost function before and after the iteration (it-0 and it-1).

Figure Q2-1. The input data points

数据科学算法代考
数据科学算法代考

3.An Association Rule is an implication expression of the form X àY, where X and Y are disjoint itemsets.

(a)Howmany possible non-empty itemsets can be generated from a list of 12 unique items? How many non-redundant association rules can be generated from them? (5 marks)   数据科学算法代考

(b)Is Association Rule Mining a descriptive or predictive data mining approach? Explain ARM and the meaning of the data model it provides. (3marks)

(c)What are the support and the confidence of an association rule? Describe these measures and provide their formula. Given the transactionsin Table Q3-1, what are the support and the confidence of the following four rules?

  1. {steak} à{wine}
  2. {bread, eggs} àcheese}
  3. {cod, potatoes} à{peas}
  4. {peas,potatoes} à {cod} (12 marks)
TID itemset
1 potatoes onions sausages peas
2 cod peas potatoes
3 bread steak crisps wine
4 eggs bread oranges
5 crisps cola sausages beer
6 onions potatoes eggs
7 cod eggs peas wine
8 crisps chocolate cola
9 crisps beer 数据科学算法代考
10 steak wine lettuce cheese
11 cheese eggs bread
12 onions potatoes cod
13 chocolate crisps cola beer
14 oranges peas lettuce potatoes
15 bread cheese
16 eggs sausages potatoes
17 steak cod eggs wine
18 crisps chocolate
19 bread eggs cheese
20 bread sausages wine

Table Q3-1. A list of transactions

(End of Question Paper)

数据科学算法代考
数据科学算法代考

 

更多代写:Statistics统计学网课代考  雅思代考  爱尔兰管理学作业代写  加拿大ps代写  加拿大学期论文代写  被认为作弊

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

 

天才代写-代写联系方式