当前位置:天才代写 > CS代写 > 数据科学和算法工具代写 Data Science代写

数据科学和算法工具代写 Data Science代写

2022-04-15 09:57 星期五 所属: CS代写 浏览:154

数据科学和算法工具代写

Problem 2

数据科学和算法工具代写 In this part we would compare K means clustering (partitional) and Agglomerative hierarchical clustering (hierarchical).

Part (a)

Cluster analysis is an unsupervised classification that partitions a set of data into meaningful sub-classes or clusters, based on a similarity measure and a clustering algorithm. Within each cluster, the data / objects would be quite similar to each other and show a pattern of natural grouping, i.e. high intra-class similarity. Between groups, the data / objects should be dissimilar, or low inter-class similarity. Cluster analysis is usually applied in tasks whose nature is to discover groupings or hiddensu patterns in data, such as pattern recognition / marketing and so on. There are two typical types of clustering, partitional and hierarchical.

Partitional clustering will assign each data point to a centroid and thus generate some partitions of the data iteratively. The centroids for each partition are usually estimated using the member of that partition.  数据科学和算法工具代写

Hierarchical clustering will generate a set of nested clusters organized as a hierarchical tree that is similar to taxonomy structure. It can be done via agglomerative clustering or divisive clustering. The former approach will start with each point being a cluster of its own, then the algorithm merge closest points until there is only one cluster left. The latter one will start with one cluster containing all data, then split the cluster till each cluster contains one data point only.

数据科学和算法工具代写
数据科学和算法工具代写

Part (b)  数据科学和算法工具代写

In this part we would compare K means clustering (partitional) and Agglomerative hierarchical clustering (hierarchical).

The advantage of K means over agglomerative method is:

Very simple and intuitive to interpret

It’s computationally efficient, so it adapts to new data very easily. For each input of new data, we can simply calculate its distance to all centroids, and pick the closest one as its label.   数据科学和算法工具代写

Its disadvantages are:

We will have to pick the parameter k manually. Too low or too high the k value will generate poor performance and it’s relatively hard to tune.

It doesn’t scale well with high dimensional data, distance based metrics generally converge to similar values when the dimension is high. And k means algorithm based on distance metric would fail.

The agglomerative method’s advantages are:

We don’t have to specify the parameter k.

The dendrogram chart is hierarchical and could help us determine hidden patterns of the data.   数据科学和算法工具代写

The disadvantages are:

It’s more complex to compute than k means, as we will have to check all possible pairs of data points.

It doesn’t adapt to new data easily. We will have to check the connection between new data points and all existing points to decide the clustering.

Part (c)   数据科学和算法工具代写

The filled worksheet is attached below:

数据科学和算法工具代写
数据科学和算法工具代写

 

更多代写:商科网课作业  多邻国代考  生物学代上网课  社会心理学作业写作  设计类论文代写  毕业论文辅导

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

 

天才代写-代写联系方式