BIA-656 – Assignment 4
training set代写 This problem involves the OJ data set which is part of the ISLR package.a)Create a training set containing a random
Problem 1 training set代写
This problem involves the OJ data set which is part of the ISLR package.
a)Create a training set containing a random sample of 800 observations, and a test set containing the remaining
b)Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results
c)What are the training and test errorrates?training set代写
d)Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
e)Compute the training and test error rates using this new value for cost.
f)Repeat parts b) through e) using a support vector machine with a radial kernel. Use the default value for gamma.
g)Repeat parts b) through e) using a support vector machine with a polynomial kernel. Set degree=2.
h)Overall, which approach seems to give the best results on thisdata?
Problem 2 training set代写
Use a program to fit a single hidden layer neural network (ten hidden units) via back- propagation and weight decay.
a)Apply it to 100 observations from themodel
where is the sigmoid function, Z is standard normal, X= (X1, X2), each X being independent standard normal, and a1 = (3, 3), a2 = (3, −3). Generate a test sample of size 1000, and plot the training and test error curves as a function of the number of training epochs, for different values of the weight decay parameter. Discuss the overfitting behavior in each case.training set代写
b)Vary the number of hidden units in the network, from 1 up to 10, and determine the minimum number needed to perform well for this
Problem 3 training set代写
The Bureau of Transportation Statistics maintains data on all aspects of air travel, including flight delays at departure and arrival ( http://www.bts.gov). LaGuardia Airport (LGA) is one of three major airports that serves the New York City metropolitan area. United Airlines (UA) and American Airlines (AA) are two major airlines that schedule services at LGA. The zip file FlightDelays.zip contains information on all departures of these two airlines from LGA during November and December 2017. Each row of the data set is an observation and each column represents a variable.training set代写
a)Perform some exploratory data analysis on flight delays lengths for UA and AA
b)Bootstrap the mean of flight delay lengths for each airline separately and describe the distribution.
c)Bootstrap the ratio of means. Provide plots of the bootstrap distribution and describe the distribution.
d)Find the 95% bootstrap percentile interval for the ratio of means. Interpret the interval.
e)What is the bootstrap estimate of the bias? What fraction of the bootstrap standard error does itrepresent?
f)For inference, we usually assume that the observations are independent. Is this condition met in thiscase?