RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS
STATISTICS RESEARCH代写 This assignment is worth 20% of your overall marks for this course (for all students, enrolled in STAT2008,
College of Business & Economics, The Australian National University REGRESSION MODELLING
Assignment 2 for 2018
• INSTRUCTIONS: STATISTICS RESEARCH代写
This assignment is worth 20% of your overall marks for this course (for all students, enrolled in STAT2008, STAT4038 orSTAT6038).
If you wish, you may work together with another student (one other) in doing the analyses and present a single (joint) report. If you choose to do this then both of you will be awarded the same total mark. Students enrolled under different course codes may work together. You may NOT workin groups of more than two students and the usual ANU examination rules on plagiarism still apply with respect to people not in your group. This means you should not discuss the assignment (questions, solutions, code, etc.) with your classmates or any other individuals if they are not in your group. You can discuss the assignment with me (Anton Westveld) or your tutors.STATISTICS RESEARCH代写
Please submit your assignment on As a group you should only submit one assignment. Make sure to place to place the names and IDs of the individuals in your group on thefront page of your assignment. When uploading to Wattle you will submit:
1. Your assignment/report.
2.An‘.R’ ftle containing the R code you used for the
Assignments should be Your assignment may include some carefully edited computer output (e.g. graphs, tables) showing the results of your data analysis and a discussion of those results, as well as some carefully selected code. Please be selective about what you present and only include as many pages and as much computer output as necessary to justify your solution. It is important to be be concise in your discussion of the results. Clearly label each part of your report with the part of the question that it refersto.STATISTICS RESEARCH代写
Unless otherwise advised, use a significance level of5%.
Marksmay be deducted if these instructions are not strictly adhered to, and markswill certainly be deducted if the total report is of an unreasonable length, e. morethan 10 pages including graphs and tables. You may include an appendix that is in addition to the above page limits; however the appendix will generally not be marked, only checked if there is some question about what you have actually done.
Assignmentswill be marked by your tutor (or one of your two tutors, for joint assignments). You may ask any of the tutors or me (Anton Westveld) questions about this assignment up to 4 pm on Thursday 17 May 2018.STATISTICS RESEARCH代写
Late assignments will NOT be accepted after the deadline without an extension. Extensions will usually be granted on medical or compassionate grounds on production of appropriate evidence, butmust have my permission by no later than 12 noon on Thursday 17 May 2018. Evenwith an extension, all assignments must be submitted reasonably close to the original deadline to allow time for the marking to be completed.
- (100 points) You will explore the techniques for the course by examining data on the number ofvisits to a health care professional in Australia from 1977-78.
The data have been placed on The variables are:
sex: 1 if female, 0 ifmale
age:Age in years divided by 100 (measured as mid-point of 10 age groups from 15-19 years to 65-69 with 70 or more coded treated as 72)
income: Annual income in Australian dollars divided by 1000 (measured as mid-point of codedranges Nil, less than 200, 200-1000, 1001-, 2001-, 3001-, 4001-, 5001-, 6001-, 7001-,
8001-10000, 10001-12000, 12001-14000, with 14001- treated as 15000
insurance: insurance contract (medlevy : medibanl levy, levyplus : private health insurance, freepoor : government insurance due to low income, freerepa : government insurance due to old age disability or veteranstatus
illness: number of illness in past 2weeks STATISTICS RESEARCH代写
actdays: number of days of reduced activity in past 2 weeks due to illness orinjury
hscore: general health score using Goldberg’s method (from 0 to 12). High score indicates bad health
chcond:chronic condition (np : no problem, la : limiting activity, nla : not limiting activity)
doctorco: number of consultations with a doctor or specialist in the past 2weeks
nondocco:number of consultations with non-doctor health professionals (chemist, optician, physiotherapist, social worker, district community nurse, chiropodist or chiropractor) in the past 2 weeks
hospadmi:number of admissions to a hospital, psychiatric hospital, nursing or convalescent home in the past 12 months (up to 5 or more admissions which is coded as 5)STATISTICS RESEARCH代写
hospdays: number of nights in a hospital, etc. during most recent admission: taken, where appropriate, as the mid-point of the intervals 1, 2, 3, 4, 5, 6, 7, 8-14, 15-30, 31-60, 61-79 with 80or more admissions coded as If no admission in past 12 months then equals zero.
prescrib: total number of prescribed medications used in past 2days
nonpresc: total number of non-prescribed medications used in past 2days
(a)(15points) Conduct an exploratory data analysis, STATISTICS RESEARCH代写
where the response y = doctorco +nondocco(i.e. the total number of visits to health care professional in the past two weeks) in relation to the other variables, which should be considered explanatoryvariables (covariates). Indoingyour analysis make sure to identify any unusual points and discuss why they are For this assignment do not remove any unusual points, only comment on them (if they exist).STATISTICS RESEARCH代写
(b)(8.5 points) Fit a multiple linear regression model with the response variable and with the other variables in the data as explanatory variables. Do not consider any transformations of thecovariates or Present the main residual plot of the residuals against the fitted values for this model, along with a lowess smoother. Are there are any obvious problems with underlying assumptions?
(c)(8.5 points) Consider a few transformations of y, such as log(y + 1),
√y, y1/4. Fit a multiple linearregression model with the response variable and with the other variables in the data as explanatory variables. Do not consider any transformations of the covariates or interactions. Again present the main residual plot of the residuals against the fitted values for this new model, along with a lowess Do any of the transformation applied to the response variable appear to have corrected any problems you identified in part (b)?STATISTICS RESEARCH代写
(d)(8.5 points) Try using the Box-Cox approach to find a transformation. Again present the main residual plot of the residuals against the fitted values for this new model, along with a lowess smoother. Do any of the transformation applied to the response variable appear to have corrected any problems you identified in part (b) and (c)? Based on your analysis, decide whether a transformation should be considered and if so clearly state which one. Use this transformation through the rest of the assignment.
(e)(8.5 points) Construct two added variable plots: one for income and one for age. Comment on the plots.STATISTICS RESEARCH代写
(f)(8.5points) Construct confidence intervals for all pairwise differences for the factor insurance with a family level α = 0.05. Which differences are statistically significant, if any?
(g)(8.5 points) Construct confidence intervals for all pairwise differences for the factorchcondwith a family level α = 0.05.Which differences are statistically significant, if any?
(h)(8.5points) Examine (but do not present) STATISTICS RESEARCH代写
the ANOVA (Analysis of Variance) table and sum- mary output for the model which you chose in (d). Now adjust the order of the explanatory variables so that you can test the following nested
H0 : βinsurance = βsex = βage = βincome = βnonpresc = 0
H0 : βinsurance = βsex = βage = 0
H0 : βinsurance = 0
Present the ANOVA table for the re-ordered model and discuss the result of the partial (nested) F-tests for the above hypotheses. Fully write out the tests. Do your results suggest some possible modification(s) you could make to the model? If so then make those modifica- tions.
(i)(8.5 points) Investigate whether the variable sex has an interaction effect with any of the other variables.
(j)(8.5 points) For your model, construct a plot of the internally Studentized residuals against thefitted values, a normal Q-Q plot of the residuals, and a bar plot of Cook’s distances for each observation. Use these plots (and other means) to comment on the model assumptions and on any unusual data points.STATISTICS RESEARCH代写
(k)(8.5 points) Fully interpret the results of your final model. Provide plots of (y against age), (y against income), and (y against hscore), with regression lines for the different levels of thefactor insurance. Additionally, add 95% point-wise confidence intervals for the regression lines (each confidence interval can have an α = 0.05). Finally, use different plotting symbols for male and female.