大数据分析A take-home test II
We consider the use of a logistic regression model to predict the probability of default using income and balance on the Default.csv(请见附件)data set.
1:文件 samgov.csv(请见附件)是一项来自于政府网络问政平台的公众来信,包含 697 封来自不同地区的公众向地方政府的投诉。
网络问政平台的公众来信是典型的文本资料,可以开展 文本挖掘。请结合“文本挖掘和情感分析”的课堂教学内容,完成下列文本分析任务 (50%)
a)利用R或Python软件对csv的文本资料进行分词、停词和词频统计

In particular, we will now compute estimates for the standard errors of the income and balance logistic regression coefficients in two different ways: (1) using the bootstrap, and (2) using the standard formula for computing the standard errors in the glm() function. Do not forget to set a random seed before beginning your analysis )
a)Usingthe summary() and glm() functions, determine the estimated standard errors for thecoefficients associated with income and balance in a multiple logistic regression model that uses both predictors.
b)Writea function, fn(), that takes as input the Default data set as well as an index of the observations, and that outputs the coefficient estimates for income and balance in the multiple logistic regression model.
c)Use the boot() function together with your fn() function to estimate the standard errors of the logistic regression coefficients for income andbalance.
d)Commenton the estimated standard errors obtained using the glm() function and using your bootstrap function.

