ISE 529 Predictive Analytics
Midterm Exam
Submit on October 16, 2020 by 1 p.m.
ISE 529 Predictive Analytics代写 1.(10 pts.)In Lecture 5 Introductory Example 2, a pivot table found that non-USA cars are on average more expensive than ···
1.(10 pts.)
In Lecture 5 Introductory Example 2, a pivot table found that non-USA cars are on average more expensive than USA cars by around $2000. But fitting model
m1 = smf.ols(formula = ‘Price ~ MPG_city + Origin’,data = df).fit()
it was found that non-USA cars were on average more expensive than USA cars by around $5264. This looks like a contradiction. Which dollar amount is correct? Why?
2.(20 pts.) ISE 529 Predictive Analytics代写
The file csv has monthly demand from 2000 to 2011. You are asked to predict it for 2012. Use library statsmodels.formula.api to build a linear regression model to predict the demand using one-hot encoding and
a)year(numerical) and months (categorical) as predictors.
b)year(numerical) and months (categorical) and their interaction as predictors.
For both models plot the demand with the predictions on all years. For (b) add to plot, 95% CIs for 2012.
3.(20 pts.) ISE 529 Predictive Analytics代写
In Homework 4 you built regression models to predict houses prices on the csv data set. You reduced the dataset to houses with two to five bedrooms, style 1 to 7, and houses not close to a highway. Then removed column highway to find the reduced data set with 485 rows.
Now use 10-fold cross validation, KFold(n_splits=10,random_state=2,shuffle=True), to find MSPE of the best AIC model and the best BIC model (the argument shuffle requests to shuffle the rows before splitting it into folds). Find the square root of the MSPE values. Which model (best AIC or best BIC) predicts best the house prices?
4.(50 pts.) ISE 529 Predictive Analytics代写
Download from org/datasets/movielens/1m/ the file ml-1m.zip. Extract the files and read the README file. The files can be open with
pd.read_csv(’users.dat’, sep=’::’,engine =’python’). There are 3 files: movie ratings, movie data (genres and year), and users (age,zip code,gender,id,and, occupation). Column title from the file movies.dat shows the movie title and year. Split that column into two new columns using
movies[’name’] = movies[’title’].str.slice(start=0,stop=-7)
movies[’year’] = movies[’title’].str[-5:-1]
movies[’year’] = movies[’year’].astype(int)
del movies[’title’]
For some of the questions you may want to merge the three files into one by using
pd.merge(pd.merge(ratings, users), movies).
The resulting dataframe is of interest in developing recommendation systems. For that purpose, answer the following questions
a)Reportthe name and year of the five movies with largest number of ratings.
b)Findnames of the top-rated movies by females from 1995 to 2000.
c)Formovies with at least 250 ratings, find the average Show the names of the 5 movies with the largest average rating.
d)Consider users dissagreement as the standard deviation of each movie ratings. Find the five movies with the largest rating dissagreement.
e)Consider gender dissagreement as the difference between the average rating of males minus that of females(in absolute value). Find the names of the 5 movies with the largest gender dissagreement.
Only one submission is allowed.
其他代写:assignment代写 homework代写 essay代写 algorithm代写 analysis代写 code代写 app代写 assembly代写 CS代写 Exercise代写 C++代写 C/C++代写 course代写 Data Analysis代写 data代写