预测分析考试代考 ISE 529代写回归模型代写

ISE 529 Predictive Analytics Exam 2

预测分析考试代考 1.Visit the Titanic https://www.kaggle.com/c/titanic page on the Kaggle site. Read the Description and Evaluation items,

1.Visit the Titanic https://www.kaggle.com/c/titanic page on the Kaggle site. Read the Description and Evaluation items, then use the Data tab to download the csv files. Read the Overview.

The objective is to predict if a passenger would survived based on the features data.

Visit the Titanic Data Science Solutions page:

https://www.kaggle.com/startupsci/titanic-data-science-solutions

Spend sometime reading and running the Jupyter Notebook provided in that page.

a) (10 pts.) 预测分析考试代考

Use the train set to answer true or false to each of the following

More than 75% passengers did not travel with parents or children
30 to 33% of passengers had siblings and/or spouse aboard
Less than 1% of passengers paid a fare as high as 500 dollars
Less than 1% of passengers are 65+ years old

b) (20 pts.)

Fill NAs values as follows

Drop columns PassengerID, Name, Ticket, Cabin.
Fill NAs values in Embark with the most common category
Fill NAs values in Fare with the median value in that column
Fill NAs values in Age with the median value in each of Pclass x Gender combination

c) (20 pts.) 预测分析考试代考

Perform the following Data cleaning and Feature Engineering steps for both the train and test files.

Split column Age into 5 intervals (0, 16, 32, 48, 64, 100) (now categorical)
Split column Fare into 4 intervals (0, 7.9, 14.5, 31, 600) (now categorical)
Create column Size by adding values from columns SibSp, Parch, then drop them keeping the created column.
Create binary column Alone with value 1 (if passenger travels alone) and 0 otherwise.

Use get_dummies to convert categorical to binary columns.

d) (40 pts.) 预测分析考试代考

The test.csv file from kaggle does not include Survived. So split the train set into new train and test subsets. Use the train subset to fit the following models

KNN
Support Vector Classifier
Logistic Regression
Random Forest
Gradient Boosting

Use GridSearchCV to fifind best hyperparameter values. Report the test accuracy rate (using the test subset). Try to improve the test accuracy rate (include polynomial and/or interaction terms or use other machine/satistical method or any other mean).

e) (10 pts.) 预测分析考试代考

Submit your best predictions, on to Kaggle. Report your kaggle ID name, the date submitted and, the Score provided by Kaggle.

Make sure your report includes your name and your Section (Tuesday or Friday).

Report should be clean and well formatted (do not truncate tables, plots, python commands, no screen captures). Use random_state = 0 wherever is needed.

合作平台：essay代写论文代写写手招聘英国留学生代写