Engineering Sciences代写 Computer Science代写 module code代写

UNIVERSITY COLLEGE LONDON

Engineering Sciences代写 Assignment Release Date: 2nd October 2019Assignment Hand-in Date: 16th October 2019 at 11.55amFormat: Problems

Faculty of Engineering Sciences

Department of Computer Science

COMP0036: LSA – Assignment 2

Dr. Dariush Hosseini (dariush.hosseini@ucl.ac.uk)

Overview Engineering Sciences代写

Assignment Release Date: 2nd October2019
AssignmentHand-in Date: 16th October 2019 at 55am
Format:Problems

Guidelines

You should answer all THREE
Note that not all questions carry equal
You should submit your final report as a pdf using Turnitin accessed via themodule’s Moodle page.
Within your report you should begin each question on a newpage.
You should preface your report with a single page containing, on twolines:
- The module code:‘COMP0036’
- The assignment title: ‘LSA – Assignment2’

Your report should be neat andlegible.Engineering Sciences代写

You are strongly advised to use LATEX to format the report, however this is not a hard requirement, and you may submit a report formatted with the aid of another system, or even a handwritten report, provided that you have converted it to a pdf (see above).

Please attempt to express your answers as succinctly as succinctly as possible.
Please note that if your answer to a question or sub-question is illegible orincomprehen- sible to the marker then you will receive no marks for that question or sub-question.Engineering Sciences代写
Please remember to detail your working, and state clearly any assumptions which you make.
Failure to adhere to any of the guidelines may result in question-specific deduction of marks. If warranted these deductions may be punitive.

Assumean unlabelled dataset, , with sample mean, , and an orthonormal basis set,where d < m.

Engineering Sciences代写

We have investigated the ‘Projected Variance Maximisation’ approach to PCA in which we are interested in finding the d-dimensional subspace spanned by

for which the sum of the sample variance of the data projected onto this subspace is max-imised.

This leads to a formulation of the PCA problem as:

(1)

However there are several other approaches to PCA. One is the ‘Reconstruction Error Minimisation’ approach.

Here we are interested in finding the d-dimensional subspace spanned by .u^[^j^]Σd

which minimises the reconstruction error, i.e.:(2)

(a) [6 marks]

Show that problems (1) and (2) are equivalent.

(b) [4 marks]

In each analysis we were careful to centre our input data by effectively subtracting off the mean. Why is it important to centre the data in this way?Engineering Sciences代写

(c) [6 marks]

Show that we can re-write the objective of problem (1) as follows, and provide an expression for the matrix S:

(3)

(d) [4 marks]

If we were to replace S with the sample correlation matrix and then proceed to perform PCA with this objective, under what circumstances would this form of PCA differ from the covariance matrix version?

Assume a set of unlabelledpoints, , which we wish to separate into k clusters.Engineering Sciences代写

We wish to use a ‘Spherical’ Gaussian Mixture Model (GMM) for this clustering task. Here we assume that each point has an unknown, latent, cluster assignment associated with it, z ∈ {1, …, k}, which is the outcome of a multinomial random variable, Z:

Furthermore, contingent on the cluster assignment, z, we assume that each point, x, is the outcome of a Gaussian random variable, X :

Where:

µ^[^j^] ∈ Rd is the mean associated with cluster j, and;

Σ^[^j^] = sI is the covariance associated with each cluster j. Here s is a constant, and I denotes the d × d identity matrix.

(a) [6 marks]

Give an expression for the log likelihood of the dataset, {x⁽ⁱ⁾}n

in terms of s and{µ[j], π[j]}k=1.Engineering Sciences代写

(b) [5 marks]

We wish to maximise this expression subject to the usual constraints using the EM algorithm.

Give an expression for the responsibilities, γⁱ^[^j^], in this case.

(c) [8 marks]

Explain why, as s → 0, the solution we generate from the spherical GMM EM clustering algorithm will tend towards the solution we would generate from the k-means clustering algorithm.Engineering Sciences代写

(d) [3 marks]

Describe and explain the form of the boundary which discriminates between clusters in this case.

(e) [3 marks]

Now consider the more usual GMM EM clustering algorithm, in which we assume that the covariances, Σ^[^j^] > 0, are no longer constrained to take the isotropic form which we assumed earlier and are in general distinct ∀j.

Describe and explain the form of the boundary which discriminates between clusters in this case in general.

(a) [4marks]

Explain the importance of the Representer Theorem?

(b) [3 marks]

Consider the A₁-regularised Logistic regression optimisation problem:

Here:

represents a set of training data, where x ∈ Rm are input attributes,

while y ∈ {0, 1} is the output label;

w ∈ Rm is the weight vector of the linear discriminant which we seek, and;

λ > 0 is some constant.

Is this problem susceptible to the kernel trick? Explain.

(c) [4 marks]

Consider the following clustering problem:

Here:

n i=1represents a set of unlabelled points, where x ∈ Rm are input attributes;

{µ[j] ∈ Rm}krepresent the k cluster centroids, and;

{ρⁱ^[^j^] ∈ {0, 1}}n,k represent a set of assignment variables.

Let us seek to perform this optimisation using the k-means algorithm.

Is the M-step of this procedure susceptible to the kernel trick? Explain.

(d) [4 marks]

Assuming 2-dimensional input attributes, x = [x₁, x₂]^T , what kernel function, κ, is associated with the following feature map, φ (express your answer in terms of vector dot products):

(e) [5 marks]

Assuming 1-dimensional input attributes, x, what feature map, φ : x ›→ φ(x), is associated with the following kernel, κ:

κ(x⁽ⁱ⁾, x⁽^j⁾) = exp .−σ²“x⁽ⁱ⁾ − x⁽^j⁾“2Σ

合作平台：天才代写幽灵代写写手招聘 Essay代写