当前位置:天才代写 > 作业代写 > Stat1代写 Assignment代写 report代写 R代写 dataset代写 procedure代写

Stat1代写 Assignment代写 report代写 R代写 dataset代写 procedure代写

2021-01-20 13:58 星期三 所属: 作业代写 浏览:43

Stat1代写

Stat 5330: Assignment 2

Stat2代写 This assignment is due on 9pm Tuesday Oct 23rd, 2018. A paper copy of the well- written report should be submitted to my mailbox (Xiwei Tang)

This assignment is due on 9pm Tuesday Oct 23rd, 2018.Stat2代写

A paper copy of the well- written report should be submitted to my mailbox (Xiwei Tang), which can be found inthe first floor of Halsey Hall, the corner where you make a left turn after entering the frontdoor. An electronic copy of your code should be submitted on Collab. In the paper report,the results should be clearly stated with some reasonable explanations (including necessaryplots). Please also copy your codes at the end of the paper report. Please DO NOT just copy and paste the raw outputs obtained from your software. Also please write down both your full name and the computing ID at the first page.Stat2代写

Stat2代写
Stat2代写

1.Analyze the email spam dataset using different classification problems.Stat2代写

Thedataset consists of two parts : a training dataset with 3065 obs and 58 variables, a testing data set with 1536 obs and 58 variables. In each dataset, the first 57 columns store the predictors, and the last column stores the binary response variable (spam=1, not spam=0). The datasets are in .txt files attached as  You might use following codes to read the data into R.

1

setwd ( ” path o f the f o l d e r where you put the data f i l e s ” )Stat2代写

2

t r a i n=re ad . t a b l e ( ” t r ai n d a t a . t x t ” )

3

t e s t=re ad . t a b l e ( ” t e s t d a t a . t x t ” )

(a)PerformLDA and QDA with the predictors V55-V57 (columns55-57), and apply your trained model on the testing  Report the classification accuracy (num of obs that are correctly classified / total num of obs in the testing set), sensitivity rate (num of spam obs which are correctly classified as spam in the testing set/ num of totoal spam obs in the testing set), and specificity rate (num of non-spam obs which are correctly classified as non-spam in the testing set / num of totoal non-spam obs in the testing set).

(b)PerformLDA and QDA with all predictors V1-V57 (columns1-57),Stat2代写

and apply your trained model on the testing set. Report the corresponding classification accuracy, sensitivity rate, and specificity rate, respectively. Comparing your results with those obtained in part (a).

(c)Performa logistic regression model and an SVM model with all predictors V1-V57 (columns1-57), and apply your trained model on the testing set. Please report the corresponding classification accuracies, sensitivity rates, and specificity rates, Comparing your results with those obtained in part (b). Which method provides the “best” classification results? Why?Stat2代写

(d)Suppose we just coin a quarter to make a prediction on the testing set,

that is, foreach obs in the testing set, we randomly toss a coin, and assign the label 1 to this obs if we get a ”head” while assign 0 if get a ”tail”. Calculate the estimated classification accuracy, sensitivity rate and specificity rate. You might generate the prediction on the testing set by using the following codes to generate random binary (Bernoulli) outcomes:

1 # n i s the sample s i z e o f the t e s t i n g s e t , and l e t prob=0.5 Stat2代写

2

y . t e s t . p r e d i c t i o n=rbinom ( n , 1 , prob )

(e)Followingthe procedure in part (d), we try a sequence of values of ”prob”: 0, 2, 0.4, 0.5, 0.6, 0.8, 1 , please plot three figures showing the values of ”prob”(X) v.s. the prediction accuracy, sensitivity and specificity, respectively. Based on the exploration, can you infer what is the best ”prob” value in this procedure? Why?

Stat2代写
Stat2代写

其他代写:考试助攻 计算机代写 java代写 function代写 paper代写  web代写 编程代写 report代写 数学代写 algorithm代写 金融经济统计代写  finance代写 python代写 java代写 code代写 project代写 Exercise代写 dataset代写 analysis代写 [r代写

合作平台:天才代写 幽灵代写 写手招聘 Essay代写

 

天才代写-代写联系方式