Machine Learning Theory (CSC 482A/581A)

Problem Set 1 Due on Friday, February 2nd, 5pm

Instructions:

• You must write up your solutions individually.

• You may have high-level discussions with up to 2 other students registered in the course. If you discuss problems with others, include at the top of your submission: their names, V#’s, and the problems discussed.

• Your solutions should be clear and concise (and legible!). You are strongly encouraged to type up your solutions in LaTeX. For any problems where you only have a partial solution, be clear about any parts of your solution for which you have low confidence.

• If submitted through conneX, the due date is Friday, February 2nd, 7pm. This is a hard deadline. If submitted in writing, submit at the beginning of class (12:30pm) Friday February 2nd. You are thus incentivized to submit online (using LaTeX) as you get more time!

Questions:

1. Let X = R2 and take C to be the class of concentric circles C = {cr : r  0}, where for each nonnegative real number r  0, we have cr(x) = 1[kxk2  r]. Prove that C is PAC learnable.

 In particular, show a PAC learning algorithm which, given a training sample of size n ≥ log 1δ , ε finds with probability at least 1 − δ a hypothesis fˆ ∈ C for which R(fˆ) ≤ ε.

2. Devise an eﬃcient mistake bound learner for the concept class k-term DNF over X = {0, 1}d. The runtime and mistake bound of your algorithm both should be polynomial in d; you may treat k as a constant.

3. Let X = {0, 1}d and consider PAC learning a finite concept class C. Assume that the inputs are drawn i.i.d. from an unknown distribution P over X , and the labels are generated via the rule Y = c(X) for some c  C.

Let’s call this problem the “clean” problem; so, in the clean problem, the training sample consists of random examples of the form (X, Y ) for X P and Y = c(X).

Next, consider the following “corrupted” problem: Each time we request a random example (X, Y ), with probability α(X)  [0, 1] the value of the label Y is flipped. Call the resulting label Ye. Thus,

(

Y with probability α(X)

Ye =

Y with probability 1  α(X)

In the corrupted problem, the examples are of the form (X, Ye), and so the labels are noisy.

(a) Using c and α, derive an expression for the Bayes classifier for the corrupted problem.

(b) For the remaining questions, assume that α(x) = 14 for all x  X . What is the Bayes classifier for the corrupted problem?

(c) What is the Bayes risk for the corrupted problem?

(d) Let cε  C be a hypothesis for which Pr(cε(X) 6= c(X)) = ε > 0. What is the risk (expected zero-one loss) of cε for the corrupted problem?

(e) Design an algorithm for PAC learning C given access only to corrupted labeled examples (X1, Ye1), . . . , (Xn, Yen). That is, your algorithm should, with probability at least 1 − δ, output a concept fˆ  C for which EX P [fˆ(X) 6= c(X)]  ε. Your algorithm should be statistically eﬃcient (you should mention the sample size n required, and n should be polynomial in 1ε and 1δ ), but it need not be computationally eﬃcient. Please explain why your algorithm is correct.