clear all
set more off
set matsize 5000
global path "C:\Users\yaya1\Dropbox\Fall2020\ECON338\Stata Conference\Conference3"
use "$path\WAGE2.dta"
/*
1. Regression using IV
*/
gen sp_age=age*age
gen const=1
**make a matrix : Define Y
mkmat lwage, matrix(Y)
** Define X
mkmat educ age sp_age const, matrix(X)
**Matrix
mkmat sibs age sp_age const,mat(Z)
mat n1=inv(Z'*X)
mat n2=(Z'*Y)
matrix b_iv = n1*n2
mat list b_iv
*** ivreg command
ivregress 2sls lwage age sp_age (educ = sibs)
est store IV_command
** Stock and Yogo’s tests, Check the first stage : All the R2 statistics are relatively high. the F>10
estat firststage
*** ivreg command
ivregress 2sls lwage age sp_age (educ = sibs), vce(robust)
est store IV_command_robust
** Durbin and Wu–Hausman tests
* ivregress depvar . . . (y1 y2 y3 = . . .)
* estat endogenous y2 y3
estat endogenous
*** ivreg command
ivregress 2sls lwage age sp_age (educ = sibs married), vce(robust)
*** Null hypothesis : instruments are valid
estat overid
***** DWH test
*First stage
reg educ sibs married age sp_age
predict educ_p
predict rho,residual
sum rho
ttest rho=0
reg lwage age sp_age rho
/*
***Comments
Before estimating the following simultaneous equations,
z = a0 + a1*x1 + a2*x2 + epsilon1
y = b0 + b1*z + b2*x3 + epsilon2
one should decide whether it is necessary to use an instrumental variable, i.e., whether a set of estimates obtained by least squares is consistent or not.
Davidson and MacKinnon (1993) suggest an augmented regression test (DWH test), which can easily be formed by including the residuals of each endogenous right-hand side variable, as a function of all exogenous variables, in a regression of the original model. Back to our example, we would first perform a regression
z = c0 + c1*x1 + c2*x2 + c3*x3 + epsilon3
get residuals z_res, then perform an augmented regression:
y = d0 + d1*z + d2*x3 + d3*z_res + epsilon4
If d3 is significantly different from zero, then OLS is not consistent