计量经济学作业代写 ECON 308代写

ECON 308: ECONOMETRICS

计量经济学作业代写 Complete each problem to the best of your ability and submit via email by Friday, April 24. When you submit your assignment

ASSIGNMENT 4 计量经济学作业代写

Complete each problem to the best of your ability and submit via email by Friday, April 24. Whenyou submit your assignment, be sure to include “Econ 308 – Assignment 4” in the subject line. This willhelp me keep track of assignments in my inbox and ensure I don’t miss anything. You are encouraged to collaborate with other students, but you should turn in the problem solutions individually. Your writeup should include 1) written/typed responses to the questions, 2) the code you ran to generate them (your do-file), and 3) any graphs produced.

(1)Crimein San Francisco: 计量经济学作业代写

Many cities, especially larger ones, make data on police incident reports publicly available.1 An incident report is written by the police when a criminal incident is reported, and incident reports are commonly used to measure underlying crime rates, especially at the neighborhood level (because they often contain fairly precise geographic information,including latitude and longitude of the incident location). Incident reports include crimes thatwere reported but not solved, so they’re a more complete measure than closed However,they do not include every 911 call placed, so they’re not inflated by superfluous calls by paranoid neighbors and such. In this problem, we’ll use daily incident report data from San Francisco to investigate patterns in reported crimes.

(a)

Create a do-file and load the dataset sf incident reports.dta, downloaded here:	2

https://www.dropbox.com/s/ek9d6vuz9r5ji5s/sf_incident_reports.dta?dl=1

(b)Whattime period does the data cover?4

(c)Use tabulate on the incidentcategory variable to count how many times each different category of crime appears. What are the five most commonly reported crimes? Which is the mostcommon?计量经济学作业代写

(d)Restrict the data to include only observations from the most common incident How many observations do you havenow?

(e)Use tabulateto count the number of observations for each day of the week.5 What day(s) seem to have the largest amount of this type of crime?

(f)We’regoing to further investigate patterns in the frequency of this First, we’ll create daily counts by aggregating together reports that occurred on the same day. To do so,generate a new variable called count that is equal to 1 for each observation. Then, we will collapse the dataset, adding up the count variable by day. This will give us a dataset that has one observation for each day, along with the number of reported crimes on that day. To do this, complete the following steps:计量经济学作业代写

(i)We’ll use the incidentdate variable to create a timestamp that Stata understands. To do this, execute the followingcommand:

generate time = date(incidentdate,"YMD")

1See here for an example:

https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783/data.

2 If you have trouble downloading because of the size (around 150mb), let me know.

3Or in .csv:https://www.dropbox.com/s/x6dybzaeeik4ox4/sf_incident_reports.csv?dl=1.

4Hint: Because of the incidentdate format, you can sort the observations in normal time order using this variable.

5Hint: Look at the raw data to fifigure out which variable to use for this.

This will give you a numeric variable that increases by 1 each day. So, if there are any days missing in our data (because no crimes were reported on that day), this variable will account for that and there will be a gap between adjacent values.

(ii)Now, we will generate an encoded variable that tells us the day of the week. To do so, use encode on the appropriate variable, and have encode generate a new variable calledday.计量经济学作业代写

(iii)Finally,execute

collapse (sum) count, by(incidentdate day time)

This will create one observation for each unique combination of incidentdate, day, and time. Since each date is only associated with one time period and day of the week, this is equivalent to one observation per day. Because we included (sum) count, thiswill add up the count variable for each unique date, giving us the total count of crimes per day.

(g)Before, we found that total crime counts tended to differ by Now, let’s see if those differencesare statistically significant. 计量经济学作业代写

We can do so by regressing our crime count variableon indicators for the day of the week. Do so and report your findings. What is the omittedday category? Which day tends to have the highest crime count? Is it significantly different from all of the other days of the week?

(h)Afterrunning the above regression, execute the Bruesch-Pagan test for

Do this manually as described in the lecture notes. What do you find?7

(i)Whenyou have time series data like this, it’s good to be able to visualize We can do thisby using the time variable to tell Stata that this is a time series; just execute tsset time.计量经济学作业代写

(j)Now, we can use the tsline command on the count variable to look at crime levels over Use this command to plot daily crime counts over the whole sampleperiod.

(k)Investigate the plot. Do you see any days with unusually high counts? Speculate on the cause.

(l)Todetermine if there is an overall time trend in crime levels, we can regress count on time. Since time is essentially a count of elapsed days since the beginning of the sample, this will give us a linear trend in crime counts over Run the regression and report your findings.

(m)Runa test for heteroskedasticity (using a built-in function to doing so). What do you find?

(n)Youmay have noticed a dip in crime counts near the end of the Identify the date ofSan Francisco’s Shelter in Place order, and generate an indicator variable called post thatequals 1 for dates on or after that day, and 0 otherwise. Now, rerun the above regression, including the post variable as well as an interaction between time and post. What do you find? Is there evidence of a time trend in the period before the order? After? Is the difference in the pre/post time trends statistically significant?计量经济学作业代写

(2)OPTIONAL EXTRA CREDIT: Reload the original dataset from problem 1, and generate an- other time series plot, this time for a different crime of your choosing. Do you see patterns similarto those of the previous crime category considered?

6If you’re having trouble, use help encode to check out the help file.

7Hint: After you run the second regression for the BP test, you don’t have to run anything else to get the F test; the results of the F test for the whole regression are reported along with the regression output, in the top right corner.

(3)PredictingCredit Default: If you’ve ever used credit of any kind, or maybe even if you haven’t, you have credit scores. 计量经济学作业代写

These are numerical scores created by companies like TransUnion and Experian to measure your ability to generate profits for Using proprietary algorithms, they take information about your financial accounts, payment behavior, borrowing behavior, defaults, etc. to generate a summary score that assesses your ability to pay back money you’veborrowed. These scores are not only used by lenders, but also landlords and (sometimes) em-ployers (to determine if you are a person who should have a job). They look something like this:

Here, we’re going to use the model validation and variable selection tools we’ve learned aboutto formulate models for predicting consumer defaults, using data from 30,000 individuals in Taiwan. 计量经济学作业代写

(a)Download the dataset credit default here:

https://www.dropbox.com/s/4oe4zn12j1qj4v3/credit_default.dta?dl=1.

(b)Create a do-file and load the dataset. The dataset includes 30,000 observations and 25 variables. Theyare:

id: A unique
limit bal: Amount of credit given (i.e., creditlimit).
sex:Gender (1 = male; 2 = female).

education: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).

marriage:Marital status (1 = married; 2 = single; 3 = others).
age: Age (inyears).
pay 0: History of past payment. The repayment status in September,
pay2,. . ., pay 6: Repayment status as of August, July,. . ., April,
bill amt1,. . ., bill amt6: Amount of bill statement in September, August,. . ., April, respectively.计量经济学作业代写

8In .csv: https://www.dropbox.com/s/ls0uh2f9j0n57q0/credit_default.csv?dl=1.

9From the codebook, the measurement scale for the repayment status is: -1 = pay duly, 1 = payment delay for one month, 2 = payment delay for two months,. . ., 8 = payment delay for eight months, 9 = payment delay for nine months and above. This goes for pay 0 as well. -2 isn’t mentioned as a possible value, but it shows up in the data, as does 0. I’m guessing 0 indicates on-time (after posting) payment, while -1 and -2 indicate pre-posting on-time payment. Just a guess though. This is not unusual when you’re downloading random datasets online.

pay amt1,. . ., pay amt6: Amount of payment in September, August,. . ., April, respec- tively.
defaultpaymentnextmonth: Indicator for whether or not the person defaulted in Oc- tober.

The last variable will be the outcome we’ll try to predict using all of the other variables (except id).

(c)Drop any observations with a 0 for education ormarriage.

(d)Generateencoded versions of the categorical variables sex, education, marriage.¹⁰ You canalso create encoded versions of the pay variables, if you’d rather treat them as categorical (rather than continuous, numerical variables).

(e)Generatesome variables that you think might be helpful in predicting default, g., paymentas a share of bill amount each month, or monthly bill amount as a share of credit limit, etc.计量经济学作业代写

(f)Createtwo different models to predict default payment using subsets of the available pre- dictors. Feel free to include any interactions or nonlinearities you like (though keep the outcome as an indicator variable). Compare them using in-sample RMSE, AIC, and Which seems to work better? Why do you think that is?

(g)Usingruniform(), generate a random variable that is uniformly distributed between 0 and 1. Then, generate a variable that equals 1 if the previous variable is greater than 0.5. I.e.,

gen rando = runiform()

gen ind = (rando>0.5)

This will effectively split your sample in half.

(h)Estimate your two previous models on just one-half of the sample. Then, use predict to generatepredicted values of the outcome for the entire Compare the MSE of your predictions on the portion of the sample you used to estimate the model (the training MSE)to the MSE computed using the rest of the data (the test MSE). Which model worked best on each? 计量经济学作业代写

(i)Finally,use the lassoregress command to perform variable selection on a rich model with many predictors. What are some predictors that appear to be important? What cross- validated MSE do you get?

10Since these are numeric, Stata won’t let you encode them. So, turn them into strings first. For example: tostring sex, replace. Then you can encode them as normal. Why does Stata make you do this? I have no idea.

其他代写：program代写 cs作业代写 Programming代写 homework代写考试助攻 finance代写代写CS java代写 course代写金融经济统计代写

合作平台：essay代写论文代写写手招聘英国留学生代写