How Stata's hat matrix differs with weights
Luke Sonnet
Source:vignettes/stata-wls-hat.Rmd
stata-wls-hat.Rmd
Researchers use linear regression with heteroskedasticity-robust standard errors. Many social scientists use either Stata or R. One would hope the two would always agree in their estimates. Unfortunately, estimating weighted least squares with HC2 or HC3 robust variance results in different answers across Stata and common approaches in R as well as Python.
The discrepancy is due to differences in how the software estimates the βhatβ matrix, on which both HC2 and HC3 variance estimators rely. The short story is that Stata estimates the hat matrix as
while the usual approaches in R, including sandwich
and estimatr
, and Python
(e.g.Β statsmodels
)
estimate the following hat matrix
This results in differences when researches estimate HC2 and HC3 variance estimators. The HC1 standard errors, Stataβs default, are the same across all packages. The rest of this document just walks through the set-up for the above and demonstrates some results from Stata, R, and Python.
Weighted least squares
Letβs briefly review WLS. Weights are used in linear regression often for two key problems; (1) to model and correct for heteroskedasticity, and (2) to deal with unequal sampling (or treatment) probabilities. In both cases, we take the standard model
where is the th unitβs outcome, is a column vector of covariates, are the coefficients of interest, and is some error, and rescale the model by the square root of that unitβs weight, . Our model then becomes
It can be shown that the solution for is
where is a diagonal matrix where each entry is , is the covariate matrix, and is the outcome column vector. Note that all weights have been scaled to sum to 1 (i.e., ). An easy way to get to compute is to first weight both and by , which is simply the weight matrix but using instead the square root of the weights. Letβs define these rescaled matrices as
Heteroskedastic-consistent variance estimators
Turning to variance, the standard sandwich estimator is
where represents , the variance-covariance matrix of the disturbances. A nice review of the different variance estimators along with their properties can be found in Long and Ervin (2000) [ungated]. The HC2 and HC3 estimators, introduced by MacKinnon and White (1985), use the hat matrix as part of the estimation of . The standard hat matrix is written:
Where are the diagonal elements of the hat matrix, the HC2 variance estimator is
where are the residuals. The HC3 estimator is very similar,
Both rely on the hat matrix. Crucially, this is where Stata and the packages and modules in R and Python disagree. When weights are specified, Stata estimates the hat matrix as
while the other software uses
Thus the HC2 and HC3 estimator differ as the values of
are quite different. How different are these results? Letβs use a little
example using mtcars
a dataset included with R.
# Using estimatr
library(estimatr)
lm_robust(
mpg ~ hp,
data = mtcars,
weights = wt,
se_type = "HC2"
)
#> Estimate Std. Error t value Pr(>|t|) CI Lower
#> (Intercept) 28.54864505 2.16281844 13.199742 4.975934e-14 24.13158053
#> hp -0.06249413 0.01445662 -4.322872 1.561752e-04 -0.09201849
#> CI Upper DF
#> (Intercept) 32.96570958 30
#> hp -0.03296977 30
We can also see that Pythonβs statsmodels
provides the same results as the methods in R (and in fact they note the
difference in an issue on
GitHub).
import statsmodels.api as sm
import pandas as pd
dat = pd.read_csv('mtcars.csv')
wls_mod = sm.WLS(dat['mpg'], sm.add_constant(dat['hp']), weights = dat['wt'])
print(wls_mod.fit().HC2_se)
#> const 2.162818
#> hp 0.014457
#> dtype: float64
If we do the same in Stata 13, we get the following output:
Linear regression Number of obs = 32
F( 1, 30) = 19.08
Prob > F = 0.0001
R-squared = 0.5851
Root MSE = 3.6191
------------------------------------------------------------------------------
| Robust HC2
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hp | -.0624941 .0143083 -4.37 0.000 -.0917155 -.0332727
_cons | 28.54865 2.155169 13.25 0.000 24.1472 32.95009
------------------------------------------------------------------------------
Stataβs standard errors are somewhat different. The only documentation of Stataβs formula for the hat matrix can be found on the statalist forum here and nowhere in the official documentation as far as I can tell.
Which should we prefer?
Just because Stata is not documenting their HC2 and HC3 estimator does not mean theyβre wrong. Also the differences tend to be minor. In fact, it is unclear which we should prefer given that there is not a strong literature supporting one or the other. However, there are several arguments to be made for .
- Itβs the estimator you get when you weight your data by the square root of the weights ( and ) and fit regular ordinary least squares. If one considers the weighted model as simply a rescaled version of the unweighted model, then users should prefer .
- The diagonal of are the weighted leverages (Li and Valliant 2009), while would need to be weighted again for the diagonal to recover the weighted leverage.