﻿ 预测分析考试代考 ISE 529代写 回归模型代写 - 考试助攻, 预测分析代写

# 预测分析考试代考 ISE 529代写 回归模型代写

2022-05-09 11:30 星期一 所属： 考试助攻 浏览：340

## ISE 529 Predictive Analytics Exam 2

1.Visit the Titanic https://www.kaggle.com/c/titanic page on the Kaggle site. Read the Description and Evaluation items, then use the Data tab to download the csv files. Read the Overview.

The objective is to predict if a passenger would survived based on the features data.

Visit the Titanic Data Science Solutions page:

https://www.kaggle.com/startupsci/titanic-data-science-solutions

Spend sometime reading and running the Jupyter Notebook provided in that page.

### a) (10 pts.)  预测分析考试代考

Use the train set to answer true or false to each of the following

• More than 75% passengers did not travel with parents or children
• 30 to 33% of passengers had siblings and/or spouse aboard
• Less than 1% of passengers paid a fare as high as 500 dollars
• Less than 1% of passengers are 65+ years old

### b) (20 pts.)

Fill NAs values as follows

• Drop columns PassengerID, Name, Ticket, Cabin.
• Fill NAs values in Embark with the most common category
• Fill NAs values in Fare with the median value in that column
• Fill NAs values in Age with the median value in each of Pclass x Gender combination

### c) (20 pts.)  预测分析考试代考

Perform the following Data cleaning and Feature Engineering steps for both the train and test files.

• Split column Age into 5 intervals (0, 16, 32, 48, 64, 100) (now categorical)
• Split column Fare into 4 intervals (0, 7.9, 14.5, 31, 600) (now categorical)
• Create column Size by adding values from columns SibSp, Parch, then drop them keeping the created column.
• Create binary column Alone with value 1 (if passenger travels alone) and 0 otherwise.

Use get_dummies to convert categorical to binary columns.

### d) (40 pts.)  预测分析考试代考

The test.csv file from kaggle does not include Survived. So split the train set into new train and test subsets. Use the train subset to fit the following models

• KNN
• Support Vector Classifier
• Logistic Regression
• Random Forest

Use GridSearchCV to fifind best hyperparameter values. Report the test accuracy rate (using the test subset). Try to improve the test accuracy rate (include polynomial and/or interaction terms or use other machine/satistical method or any other mean).

### e) (10 pts.)  预测分析考试代考

Submit your best predictions, on to Kaggle. Report your kaggle ID name, the date submitted and, the Score provided by Kaggle.