代做统计学习作业代写statistical learning

Assignment 3

代做统计学习作业 We can see that there are 4 equations and 4 unknown variables for each subintervals. Theerefore we can solve the systems of equations to generate

Question 1 代做统计学习作业

(a)

Take the derivative with regard to z, we have:

Now assume 𝑟^T𝑉 − 𝑧𝑉^T𝑉 > 0, then:

Completing the squares we have:

If 𝑟^T 𝑉 − 𝑧𝑉^T𝑉 < 0, we have:

So the solution can be written with the soft-thresholding function:

(b)

The coordinate descent algorithm is implemented below:

library(MASS)

set.seed(7)

n = 200

p = 20

V = matrix(0.2, p, p)

diag(V) = 1

X = as.matrix(mvrnorm(n, mu=rep(0,p), Sigma=V))

Y = X[,1] - 0.55 * X[,2] + 0.25 * X[,3] + rnorm(n)

X = scale(X)*sqrt(n/(n-1))

Y = scale(Y)*sqrt(n/(n-1))

1lambda=0.05

alpha=0.7

soft_thresholding = function(r,V,n,lambda=lambda,alpha=alpha){

LAMBDA= n * lambda * alpha / (sum(V*V) + n * lambda * (1-alpha))

XX = sum(r*V)/(sum(V*V) + n * lambda*(1-alpha))

one = (abs(XX)>=LAMBDA)+0

sign(XX)*(abs(XX)-LAMBDA)*one

}

P_coefficients = rep(0,p)

last_P_coefficients = rep(1,p)

while(sum(abs(last_P_coefficients - P_coefficients)) > 0.001){

last_P_coefficients = P_coefficients

for (i in 1:p) {

v=X[,i]

r = Y - X[,-i] %*% P_coefficients[-i]

P_coefficients[i] = soft_thresholding(r,v,n, lambda, alpha)

}

# print(sum(Y-X %*% P_coefficients)) 

}

cat("The coefficients solved by Coordinate Descent Algorithm\n")

## The coefficients solved by Coordinate Descent Algorithm

print(P_coefficients)

## [1] 0.588641037 -0.306612179 0.121252303 -0.009122607 0.000000000

## [6] 0.000000000 -0.009433374 0.000000000 0.038724823 0.000000000

## [11] -0.015128767 0.051720729 0.000000000 0.045485343 0.000000000

## [16] -0.035347828 0.000000000 0.000000000 0.089265079 0.000000000

library(glmnet)

## Matrix 代做统计学习作业

## Loaded glmnet 4.1-2

beta2 = glmnet(X,Y,alpha=alpha,lambda=lambda)

cat("The coefficients solved by Coordinate Descent Algorithm\n")

## The coefficients solved by Coordinate Descent Algorithm

 print(beta2$beta)

## 20 x 1 sparse Matrix of class “dgCMatrix” ## s0

## V1 0.588641054

## V2 -0.306673740

## V3 0.121166808

## V4 -0.009175566 ## V5 .

## V6 .

## V7 -0.009466051 ## V8 .

## V9 0.038717671 ## V10 .

## V11 -0.015149746

## V12 0.051750157 ## V13 .

## V14 0.045517614 ## V15 .

## V16 -0.035304500 ## V17 .

## V18 .

## V19 0.089305834 ## V20 .

 cat("Mean squared error\n")

## Mean squared error

 print(1/p*sum((P_coefficients-beta2$beta)^2))

[1] 1.045434e-09

Question 2

(a)

The ̃𝑔 exists because by the definition of natural cubic spline, we can define a cubic curve for each interval for i = 1 to n – 1. Inside each interval, thẽ𝑔 is a cubic function in the form of Then we can let̃𝑔(𝑥𝑖 ) = 𝑔(𝑥𝑖) for each i = 1 to n. And for the natural cubic spline to be smooth, we can also require the derivative of̃ 𝑔 ̃at each 𝑥𝑖 to be the same for two subintervals. We can see that there are 4 equations and 4 unknown variables for each subintervals. Theerefore we can solve the systems of equations to generate such a natural cubic spline.

(b) 代做统计学习作业

Following the hint, note that̃𝑔(𝑥) is linear at intervals (𝑎, 𝑥1 ) and (𝑥𝑛, 𝑏), therefore″ ̃𝑔(𝑥) = 0 for x = a, x= b; And that̃𝑔 is a natural cubic spline, so‴̃𝑔 (𝑥) is a constant for all 𝑎 = 𝑥 0 ≤ 𝑥 ≤ 𝑥𝑛+1= 𝑏.

(c)

Note that 𝑔(𝑥) = ̃𝑔(𝑥) + ℎ(𝑥). So:

Question 3 代做统计学习作业

(a)

The 10-Fold cross validation is implemented as below.

library(MASS)

library(splines)

library(dplyr)

## ‘dplyr’

## The following object is masked from ‘package:MASS’: ##

## select

## The following objects are masked from ‘package:stats’: ##

## filter, lag

## The following objects are masked from ‘package:base’: ##

## intersect, setdiff, setequal, union

data("Boston")

all_dfs <- seq(1, 20)

cv_group = function(k, data_size){

set.seed(0)

cvlist = list()

n = rep(1:k, ceiling(data_size/k))[1:data_size]

temp = sample(n, data_size)

x = 1:k

dataseq = 1:data_size

cvlist = lapply(x, function(x) dataseq[temp==x])

return(cvlist)

}

cvlist = cv_group(10, nrow(Boston))

average_error = c()

for(df in all_dfs){

error = 0

for(i in 1:10){

train = Boston[-cvlist[[i]],]

test = Boston[cvlist[[i]],]

model = lm(nox~ns(dis, df=df), data=train)

prediction = predict(model, test)

# mean square error 

error = error + sum((prediction - test$nox)^2)

}

average_error = c(average_error, error/nrow(Boston))

}

df = data.frame(df = all_dfs, mse = average_error)

plot(df$df, df$mse, type="l")

df

## df mse

## 1 1 0.005517658

## 2 2 0.003949362

## 3 3 0.003899021

## 4 4 0.003825912

## 5 5 0.003797626

## 6 6 0.003795578

## 7 7 0.003799139

## 8 8 0.003692330

## 9 9 0.003698045

## 10 10 0.003680476

## 11 11 0.003691626

## 12 12 0.003703575

## 13 13 0.003705500

## 14 14 0.003716244

## 15 15 0.003716586

## 16 16 0.003723415

## 17 17 0.003733690

## 18 18 0.003756013

## 19 19 0.003775849

## 20 20 0.003771833

We can see that with K = 10, the MSE is lowest. Now we will fit the model again with K = 10 and plot its fit.

library(ggplot2)

model.best = lm(nox~ns(dis, df=10), data=Boston)

predictions = predict(model.best, Boston)

Boston$prediction = predictions

ggplot(Boston) +

geom_line(aes(x=dis, y=nox, color="NOX")) +

geom_line(aes(x=dis, y=prediction, color="Predicted")) +

labs(title="Boston Housing NOX Prediction")

(b)

Using smooth.spline() to fit nox using dis:

model.spline = smooth.spline(Boston$dis, Boston$nox, cv=FALSE)

predictions2 = predict(model.spline, Boston)

Boston$prediction2 = predictions

ggplot(Boston) +

geom_line(aes(x=dis, y=nox, color="NOX")) +

geom_line(aes(x=dis, y=prediction2, color="Spline Predicted")) +

labs(title="Boston Housing NOX Prediction")