SIT741 ‑ Statistical Data Analysis
Data Analysis2代写 Assignment 2 contributes 40% of your 2nal SIT741 mark (The full mark is 40). It must be completed individually
Assignment 2 Instructions
SIT741 Assignment 2 Data Analysis2代写
Unit Chair: Wei Luo Due: 17 May 2019
Assignment 2 contributes 40% of your 2nal SIT741 mark (The full mark is 40). It must be completed individually, and submitted to CloudDeakin before the due date: 11 pm, 17/05/2019 (Week 10 Friday).
In this assignment, you will apply your learning to further analyse the 2013-2014 emergency department (ED) demands at Perth and its connection with weather events. This activity builds on Assignment 1; you may want to review your assignment 1 solution and identify any reusable code. Please start early so that you can identify any skill/knowledge gap and seek support from the teaching staff and other students.Data Analysis2代写
This assignment contains two optional tasks (Task 3.2 and 4.3). You should complete the prescribed tasks before attempting the optional ones.
Application scenario Data Analysis2代写
You work in a data science team that tries to model the ED demands in the Perth area to improve the demand prediction.
For your convenience, you are provided with the following data links, but you are encouraged to include other relevant data for your analyses.
- Theemergency departments admissions and attendances data set provided by the Department of Health of Western Australia:
- Thedaily temperature and precipitation data for the region accessible through the NOAA data APIs.Data Analysis2代写
Of particular relevance is the “Global Historical Climatology Network – Daily” data:
Task 1: Source weather data (5 points)
From Assignment 1, you have processed data for the ED demands. We still need to 2nd local weather data from the same period. You are encouraged to 2nd weather data online. Besides the NOAA data, you may also use data from the Bureau of Meteorology historical weather observations and statistics. (The NOAA Climate Data might be easier to process.)
Answer the following questions:
- Whichdata source do you plan to use? Justify your decision.Data Analysis2代写
- From the data source identi2ed, download daily temperature and precipitation data for the regionduring the relevant time period. (Hint: If you download data from NOAA https://www.ncdc.noaa.gov/cdo–web/, you need to request an NOAA web service token for accessing the data.)
- Answerthe following questions: How many rows are in the data?
- How many rows are in the data?
- What time period does the data cover?
Task 2: Model planning (5 points) Data Analysis2代写
Careful planning is essential for a successful modelling effort. Please answer the following planning questions.
- How will the 2nal model be used? How will it be relevant to the overcrowding problems at our EDs?(You may 2nd some inspiration here http://bit.ly/2p5qLH6 .) Who are the potential users of your model?
- What relationship do you plan to model or what do you want to predict? What is theresponse variable? What are the predictor variables? Will the variables in your model be routinely collected and made available soon enough for prediction?
- Asyou are likely to build your model on historical data, will the data in the future have similar characteristics?
- Whatstatistical method(s) will be applied to generate the model? Why?
Task 3: Model the ED demands (10 points)
We will start with simple models and gradually improve them. We will focus on the ED demand variable(s) that you de2ned in Assignment 1. Let’s denote it Y.
Task 3.1: Models for a single facility (10 points)
Randomly pick a hospital from the ED dataset.Data Analysis2代写
- Whichhospital do you pick?
- Fita linear model for Y using as the predictor variable. Plot the 2tted values and the
residuals. Assess the model 2t. Is a linear function suffcient for modelling the trend of Y? Support your conclusion with plots.
- Aswe are not interested in the trend itself, relax the linearity assumption by 2tting a generalised additive model (GAM). Assess the model 2t. Do you see patterns in the residuals indicating insuffcient model 2t?
- Augmentthe model to incorporate the weekly seasonality. Compare the models using the Akaike information criterion (AIC). Report the best-2tted model through coeffcient estimates and/or plots.
- Analysethe residuals. Do you see any remaining correlation patterns among the residuals?Data Analysis2代写
- Isyour day-of-the-week variable numeric, ordinal, or categorical? Does the decision affect the model 2t?
(Optional task) Task 3.2: Models for all hospitals
Now 2t a GAM for each hospital.
- Usethe function to rerun your Task 2.1 code on all nine Perth hospitals.
- Plotthe trends and residuals. What patterns do you see? Given what you found in Assignment 1, do you gain any new understanding of the ED demands?
Task 4 Heatwaves and ED demands (15 points)
The connection between heatwaves and the ED demands is widely reported, as in this news article.Data Analysis2代写
In this task, you will try to measure the heatwave and assess its impact on the ED demands.
Task 4.1: Measuring heatwave (7 points)
- John Nairn and Robert Fawcett from the Australian Bureau of Meteorology have proposeda
measure for the heatwave, called the excess heat factor (EHF). Read the following article to understand the de2nition of the EHF.
- Usethe NOAA data to calculate the daily EHF values for the Perth area during the relevant time period. Plot the daily EHF values.Data Analysis2代写
Task 4.2: Models with EHF (8 points)
Use the EHF as an additional predictor to augment the model(s) that you 2tted before. Report the estimated effect of the EHF on the ED demand. Does the extra predictor improve the model 2t? What conclusions can you draw?
(Optional task) Task 4.3: Extra weather features
Can you think of extra weather features that may be more predictive of ED demands? Try incorporating your feature into the model and see if it improves the model 2t.
Task 5: Reflection (5 points) Data Analysis2代写
Answer the following questions:
- 1.Weused some historical data to 2t regression models. What are the limitations of such data, if