STATG003/M003 STATISTICAL COMPUTING ASSESSMENT 3 (2017/18 SESSION)
STATISTICAL COMPUTING代写 •Your solutions should be your own work and are to be handed in by yourself to the Statistical Science
- Yoursolutions should be your own work and are to be handed in by yourself to the Statistical Science Departmental Office by 4pm on MONDAY, 23th of April
Detailed submission instructions are given below.STATISTICAL COMPUTING代写
- Beforeyou hand in your work, complete and sign the slip below this rubric, cut it off and attach it firmly to your
- When you submit your work, please make sure that someone in theDepartmental Office records on their list of students that you have handed in your
- Latesubmission will incur a penalty unless there are extenuating circumstances (e.g. medical) supported by appropriate Penalties are set out in the latest
editions of the Statistical Science Departmental Student Handbooks, available from the departmental web pages.STATISTICAL COMPUTING代写
- Failureto submit this in-course assessment will mean that your overall examination mark is recorded as non-complete, i.e. you will not obtain a pass for the
- Any plagiarism or collusion will normally result in zero marks for all students in- volved,which may also mean that your overall examination mark is recorded as
non-complete. Guidelines as to what constitutes plagiarism and collusion may be found in the Departmental Student Handbooks. The Turn-It-In plagiarism detection system may be used to scan your submission for evidence of plagiarism or collusion.
- Yourgrade will be provisional until confirmed by the Statistics Examiners’ Meeting in June
- Generalfeedback will be given via Moodle.
I am aware of the UCL Statistical Science Department’s regulations on plagiarism for assessed coursework. I have read the guidelines in the student handbook, and understand what constitutes plagiarism.
I hereby affirm that the work I am submitting for this in-course assessment is entirely my own.STATISTICAL COMPUTING代写
Please write your name in block letters: Your student number:
STATG003/M003 Assessment 3 — instructions
- Youare required to write a single R The code for this function should be saved in a .r file named by your student number. For example, if your student number is 17101710, your code should be saved in the file 17101710.r .
- Your function should be thoroughly commented. It should consist of a header sectionsummarising the logical structure, followed by the main body of the The main body should itself contain comments.
- You are required to submit thefollowing:
- A printout of your R
- Anelectronic copy of your R script (see below).STATISTICAL COMPUTING代写
- A brief explanation of how your function works, along with a summary of its output.The explanation should include, for example, details of any mathemat-
ical calculations that you carried out before implementing the IWLS algorithm. Where you have made decisions regarding what to produce by way of output, you should justify these decisions. As a rough guide, this explanation/summary should be no more than 2 pages long.STATISTICAL COMPUTING代写
- Your function should not create any output
- Printouts and explanations should be handed in to the Statistical Science Depart- mental Office. Remember to complete a plagiarism declaration, and to attach itto your You should ensure that all printouts are clearly identified with your student number. Your name should only be on the cover sheet.
- Electroniccopies of your R function should be submitted via the Moodle page for the course. Look for the link with the heading “Use this link to submit your assignment ICA3” and follow the instructions.
STATG003/M003 Assessment 3 — R function
Suppose that Y is a vector of geometric random variables, with Yi ∼ Geo(πi) so that
P (Yi = y) = πi (1 − πi)y−1 (y = 1, 2, 3, . . .) ,STATISTICAL COMPUTING代写
with E (Yi) = 1/πi = µi, say, and Var (Yi) = (1 − πi) /π2. Suppose also that xi is a vector of covariates, forming the ith row of a matrix X, such that
ln 1 − πi = ln [µπi− 1] = xi β = ηi, say,
for some coefficient vector β.
This can be regarded as a GLM, since the geometric distribution is in the exponential family and ηi is a monotonic function of µi.STATISTICAL COMPUTING代写
Write an R function to fit such a model using iterative weighted least squares, and to check the fitted model. Your function should be called grm (‘geometric regression model’).The arguments to the function should be y, a vector of responses to be modelled using thegeometric distribution as described above; X, a design matrix of covariates, and startval,an initial estimate of the model coefficients. If the user does not supply a value of startval,you should either provide a default (e.g. a vector of zeroes or any other sensible choice) or find some other way of starting the algorithm.
Your function should run without user intervention, and its value should be a list object containing at least the following components (you may add more components if you feel that these would be useful):STATISTICAL COMPUTING代写
y: The observed responses.
fitted: The fitted values.
betahat: The estimated regression coefficients.
sebeta: The standard errors of the estimated regression coefficients. cov.beta: The covariance matrix of the estimated regression coefficients. p: The number of coefficients estimated in the linear predictor. df.residual: The residual degrees of freedom.STATISTICAL COMPUTING代写
deviance: The deviance for the model.
The structure of your function should be similar to the following:
- Checkthat the dimensions of y and X are compatible, and that the data are suitable for modelling using the geometric distribution — if not, stop with an appropriate error
- Carryout the IWLS procedure to fit the model, and output the results to screen (as described below).
- Produce residual plots and other appropriate model
- Assemblethe results into a list object, and return this as the value of the
In step 2, the screen output should consist of: a table showing the estimated coeffi- cients, their standard errors, z-statistics and associated p-values; the number of coefficients estimated; the residual degrees of freedom for the fitted model; and the deviance for the fitted model. You may output any other relevant information if you wish.STATISTICAL COMPUTING代写
In step 3, you should use your knowledge of model checking for GLMs to produce an appropriate selection of diagnostics. You do not have to produce the same plots as R does when you plot a glm object.
Your function must not use the glm command (nor anything similar such as glm.fit)!
STATG003/M003 Assessment 3 — hintsSTATISTICAL COMPUTING代写
- There is no single ‘right answer’ to this question. To obtain a good mark you need toapproach the problem sensibly, and to provide a clear justification of what you’re doing. Credit will be given for code that is clear and readable. In particular, code that is inadequately commented will be
- Youshould ensure that your function produces output that is clearly and appropri- ately labelled and
- You arenot required to analyse any data here; however, when marking this assess- ment, your function will be tested on one or more datasets to ensure that it works correctly. You may therefore wish to test your function on a simple dataset be- fore submission, and optionally submit your test script along with your function as described STATISTICAL COMPUTING代写
4.If desired, you may use the IWLS function from Workshop 8 as a starting pointfor this
- To explain how your function works, you will probably need to use quite a lot of mathematical You are encouraged to use LATEX. That being said, a legible handwritten explanation is also perfectly acceptable.
- Inorder to explain how your function works, you will have to explain that the given distribution is in the exponential
- Yourscripts will be tested by calling your function from a program that assumes that you have done exactly what the question asks This means, for example, that you must specify your function’s arguments in the order given above, and that the names of respective elements of the list result must be the same as those given above. If you do not do this, your function will fail when called, and you will lose marks.
- R has some built-in routines relating to the geometric distribution. You may use these if you think they would be useful; however, note that the definition of the distributionin R is slightly different from that given STATISTICAL COMPUTING代写
- Ifyou have not already done so, please read the general feedback on the first ICA on Moodle. Also read the feedback on ICA 2 when it is made
- In case you are stuck or need advice, queries regarding this assessment should be madeduring an office For the details of the office hours, and a link to book an appointment, please see the Moodle page.
STATG003/M003 Assessment 3 — Optional test case script STATISTICAL COMPUTING代写
You are allowed to write a second script which loads a dataset, fits a regression model using your implementation of grm, and outputs a selection of estimates and diagnostics. The choice of data is yours, but the execution must be reproducible by any users of yourscript. Hence, limit yourself to datasets which can be loaded from a R package, or whichcan be constructed from R code within the script itself. For the former, we recommend the package datasets.STATISTICAL COMPUTING代写
The choice of data and output is yours to make. The goal of this scriptis for you to demonstrate to us an example of your script working in practice, in case we have any problems running it on our own test cases. For instance, if your script works correctly with the data provided by you but not with all of our test cases, we will be able to give you appropriate credit for demonstrating a situation in which the script works. For that to be possible however, we require that your test case script is clearly written and commented. As long as the code is clear and reproducible, the format is up to you.STATISTICAL COMPUTING代写
If you make use of this option, upload the test script as a second file. If your student number is 17101710, say, use the format 17101710test.r .
STATG003/M003 Assessment 3 — marking guidelines STATISTICAL COMPUTING代写
This assessment is marked out of 50. The marks are roughly subdivided into the following components: 11 marks for correct implementation of the IWLS algorithm, 21 marks for correct checking of input, for correct presentation of output, and for good coding style, and 18 marks for clear explanation of how your function works, for correct diagnostics, for correct mathematical expressions for the variance function, the deviance, etc.