BIA-656 – Assignment 4
This problem involves the OJ data set which is part of the ISLR package.
- Create a training set containing a random sample of 800 observations, and a test set containing the remaining
- Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results
- What are the training and test errorrates?
- Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
- Compute the training and test error rates using this new value for
- Repeat parts b) through e) using a support vector machine with a radial kernel. Use the default value for
- Repeat parts b) through e) using a support vector machine with a polynomial kernel. Set degree=2.
- Overall, which approach seems to give the best results on thisdata?
Use a program to fit a single hidden layer neural network (ten hidden units) via back- propagation and weight decay.
- Apply it to 100 observations from themodel
where 忽略公式 is the sigmoid function, 忽略公式 is standard normal, 忽略公式each 忽略公式being independent standard normal, and 忽略公式 = (3, 3),忽略公式2 = (3, −3). Generate a test sample of size 1000, and plot the training and test error curves as a function of the number of training epochs, for different values of the weight decay parameter. Discuss the overfitting behavior in each case.
- Vary the number of hidden units in the network, from 1 up to 10, and determine the minimum number needed to perform well for this
The Bureau of Transportation Statistics maintains data on all aspects of air travel, including flight delays at departure and arrival ( http://www.bts.gov ). LaGuardia Airport (LGA) is one of three major airports that serves the New York City metropolitan area. United Airlines (UA) and American Airlines (AA) are two major airlines that schedule services at LGA. The zip file FlightDelays.zip contains information on all departures of these two airlines from LGA during November and December 2017. Each row of the data set is an observation and each column represents a variable.
- Perform some exploratory data analysis on flight delays lengths for UA and AA
- Bootstrap the mean of flight delay lengths for each airline separately and describe the distribution.
- Bootstrap the ratio of means. Provide plots of the bootstrap distribution and describe the
- Find the 95% bootstrap percentile interval for the ratio of means. Interpret the
- What is the bootstrap estimate of the bias? What fraction of the bootstrap standard error does itrepresent?
- For inference, we usually assume that the observations are independent. Is this condition met in thiscase?