代写经济学科之Economics-Empirical Project The Mincer Wage Equation
Amsterdam School of Economics
Empirical Project - Group Assignment 1 (week 1) The Mincer Wage Equation
The aim of this assignment is to familiarise students with the empirical application of ordinary least squares. Creating usable data files and doing proper econometric analysis is part of this.
Write in two pages A4 a compact analysis of the wage equation. Use the common structure: Title, Authors, Abstract. 1 Introduction. 2. Theoretical background and method. 3 Data 4 Results and preferred empirical model. 5. Conclusion. 6. Appendix (this is in addition to the two pages and should contain all the background material and regressions, specification tests etc.
Submit a pdf version of this report before Monday 11 June 9:00 (am) in Blackboard. Do not forget the names and student numbers of all group members.
This assignment will be marked out of 10. It is worth 10% of the final mark for the course ’Empirical Project’.
Heij, C., Boer, P. de, Franses, P.H., Kloek, T. and Dijk, H.K. van, 2004, Econometric Methods with Applications in Business and Economics, Oxford University Press;
Björklund, A. and Kjellström, J., 2002, Estimating the return to investments on edu- cation: how useful is the standard Mincer equation?, Economics of Education Review, vol. 21, pp. 195-210.
Harmon, C., and Walker, I., 1995, Estimates of the economic return to schooling for the United Kingdom, American Economic Review, vol.85, pp.1278-1286. (N.B. Dit artikel wordt in deze opdracht enkel aangehaald en wordt nader beschouwd in de opdracht van volgende week.)
Lemieux, T., 2006, The "Mincer Equation" thirty years after Schooling, Experience, and Earnings, Chapter 11 of Jacob Mincer a Pioneer in Modern Labor Eonomics, Gross- bard, S. (ed.), Springer.1
Description: Cross sections of the American "Current Population Survey" (CPS), years 2003-2014.
• Further info: http://www.nber.org/cps
• Data download: http://www.nber.org/data/current-population-survey-data.html
Specific data for the assignment: There are ten data files (cps2004.dta – cps2013.dta) available on Blackboard. Each group analyses one year based on the last digit of the of the oldest group member. So e.g. if the last digit is 5 then this group uses cps2009.dta. All data files contain about 8000 observations.
A limited number of variables corresponding to those in Björklund en Kjellström (2002) have been selected. There has not been made a selection based on gender (in contrast with Björklund en Kjellström, 2002) and there is additional information about ethnicity of the respondents. We have the following variables:
• hwage = hourly wage in $;
• female = dummy equal to 1 for woman;
• educ = education in years (approximate, see below);
• leduc = level of schooling (precise definition given below);
• age = age in years;
• lexp = labour market experience (approximate, see below);
• etngroup = ethnic group (precise definition given below).
Specific information on variables:
leduc - educational |
attainment | code educ
Less than 1st grade
1st,2nd,3rd,or 4th grade
5th or 6th grade
7th and 8th grade
12th grade no diploma
High school graduate - high
Some college but no degree
Associate degree in college -
Associate degree in college -
Bachelor's degree (for
Master's degree (for
Professional school degree (for
Doctorate degree (for
educ is based on leduc. De duration is based on the nominal duration to attain a certain educational level.
• lexp = age - 6 - educ
etngroup – etnical group | code
Other 4 or 5 race comb. | 26
If you carry out a test, you should do so comprehensively. Specify null- and alternative hypotheses, indicate how the teststatistic is calculated, give its (approximate) distribution under the null, give its (realised) value, compare with the critical value, and draw your conclusion.
Motivate and substantiate your answer! Assignment details
You will investigate and replicated parts of Björklund en Kjellström (2002) using US data for
a single year. They use multiple years and restrict themselves to males. We shall investigate the gender gap. Starting point is the Mincerian wage equation which in its basic form is given by:
log(y) = β0 + β1s + β2x + β3x2 + ε,
with s = years of schooling and x =years of labour market experience.
In writing the report you can use the following suggestions. Carry out these suggestions where appropriate, but only report relevant results ieither in the main text or the appendix. Do NOT make the appendix a list of all the things suggested here.
1. Put the information in a format that is useful for you. The data are available in Stata-format. To put it in another format do the following
(a) Open STATA and open the data file (with extension .dta
(b) Export the data to the desired format using “File - Export" . The .xls(x) or .csv file types can be read by excel ad EViews. Some packages, like Gretl, EViews, R, can read Sata files directly without transforming. An alternative is using the package StatTransfer.
2. You now have the raw data. You should check that the data are correct (do not contain obvious errors) and are suitable or your investigation.
(a) Make a histogram of houly wage and assess whether it could be normally dis- tributed (test!) Decide whether wage below 2$/h or above 90 $/h should be included in your sample. What are the mean, mode, and standard deviation?
(b) Consider the other variables and present a table with summary statistics (in the appendix) Are there strange values?
(c) Dropping observations can be problematic. Why? Is it different when observations are dropped on the basis of explanatory variables or on dependent variables?
3. Reproduce some of the results of Björklund en Kjellström (2002). This means that you should restrict yourselves to males and natural log of hourly wage.
(a) Reproduce Table 1 and Table 2 for your year and consider the differences.
(b) Reproduce the columns III in Table 4. These columns are based on natural loga- rithms and the other columns relate to Box-Cox transformed data. Can you test whether this result is significantly different from (a)?
(c) Reproduce Table 5 with log(hwage) as dependent variable (so no Box-Cox trans- formations) Test if it is appropriate to add interaction effects and interpret the coefficients.
(d) Repeat (a)-(c) for females
4. Is there a significant difference between men and women. Consider models with dum- mies, rather than estimating separate equations for men and women.
5. Investigate the effect of ethnicity. Is it significant?
6. Which other variables do you think should be included in the Mincerian wage equation? What are the consequences of their non-availability in the data set?
7. Test for non linearity in the model. (Note that the dependent variable can be in logs, Box-Cox transformed, or not transformed).
8. Does it make sense to test for serial correlation and heteroskedasticity?
(a) If it does, check this possible specification error using different tests. Comment. Pay special attention if different tests give conflicting results.
(b) Take appropriate action if the specification of the model is rejected.
(c) Is there a significant effect of ethnicity?
9. For the correct specification and interpretation of the model it is important that the explanatory variables are exogenous. Exogeneity of schooling is questionable. An important argument is ability bias, as highlighted by e.g. Harmon en Walker (1995) which we will consider next week. Intelligence is difficult to measure and usually missing from the data, but is an obvious factor in wage. This missing variable is likely to be correlated with variables in the model.
(a) Argue the sign of bias that result of this missing variable.
(b) Does your bias expression in (a) agree with the results in Harmon en Walker (1995)?
(c) The only potential instrument to be used is age. Is this valid and useful?