Absorbing Fixed Effects with estimatr
Source:vignettes/absorbing-fixed-effects.Rmd
absorbing-fixed-effects.Rmd
Whether analyzing a block-randomized experiment or adding fixed
effects for a panel model, absorbing group means can speed up estimation
time. The fixed_effects
argument in both
lm_robust
and iv_robust
allows you to do just
that, although the speed gains are greatest with “HC1” standard errors.
Specifying fixed effects is really simple.
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## hp -0.02403883 0.01503818 -1.598521 0.1211523 -0.05484314 0.006765475 28
lmr_out$fixed_effects
## cyl4 cyl6 cyl8
## 28.65012 22.68246 20.12927
Before proceeding, three quick notes:
- Most of the speed gains occur when estimating “HC1” robust standard errors, or “stata” standard errors when there is clustering. This is because most of the speed gains come from avoiding inverting a large matrix of group dummies, but this step is still necessary for “HC2”, “HC3”, and “CR2” standard errors.
- While you can specify multiple sets of fixed effects, such as
fixed_effects = ~ year + country
, please ensure that your model is well-specified if you do so. If there are dependencies or overlapping groups across multiple sets of fixed effects, we cannot guarantee the correct degrees of freedom. - For now, weighted “CR2” estimation is not possible with fixed_effects.
Speed gains
In general, our speed gains will be greatest as the number of groups/fixed effects is large relative to the number of observations. Imagine we have 300 matched-pairs in an experiment.
# Load packages for comparison
library(microbenchmark)
library(sandwich)
library(lmtest)
# Create matched-pairs dataset using fabricatr
set.seed(40)
library(fabricatr)
dat <- fabricate(
blocks = add_level(N = 300),
indiv = add_level(N = 2, z = sample(0:1), y = rnorm(N) + z)
)
head(dat)
## blocks indiv z y
## 1 001 001 1 1.4961828
## 2 001 002 0 -0.8595843
## 3 002 003 1 0.1709400
## 4 002 004 0 -0.3215731
## 5 003 005 1 -0.3037704
## 6 003 006 0 -1.4214866
# With HC2
microbenchmark(
`base + sandwich` = {
lo <- lm(y ~ z + factor(blocks), dat)
coeftest(lo, vcov = vcovHC(lo, type = "HC2"))
},
`lm_robust` = lm_robust(y ~ z + factor(blocks), dat),
`lm_robust + fes` = lm_robust(y ~ z, data = dat, fixed_effects = ~ blocks),
times = 50
)
## Warning in microbenchmark(`base + sandwich` = {: less accurate nanosecond times
## to avoid potential integer overflows
## Unit: milliseconds
## expr min lq mean median uq max
## base + sandwich 142.07746 143.90094 146.31748 144.98387 145.72626 206.89527
## lm_robust 35.08714 35.59226 37.78890 36.62731 37.44924 98.34650
## lm_robust + fes 21.97010 22.38128 26.75375 22.52977 23.93670 89.52748
## neval cld
## 50 a
## 50 b
## 50 c
Speed gains are considerably greater with HC1 standard errors. This is because we need to get the hat matrix for HC2, HC3, and CR2 standard errors, which requires inverting that large matrix of dummies we previously avoided doing. HC0, HC1, CR0, and CRstata standard errors do not require this inversion.
# With HC1
microbenchmark(
`base + sandwich` = {
lo <- lm(y ~ z + factor(blocks), dat)
coeftest(lo, vcov = vcovHC(lo, type = "HC1"))
},
`lm_robust` = lm_robust(
y ~ z + factor(blocks),
dat,
se_type = "HC1"
),
`lm_robust + fes` = lm_robust(
y ~ z,
data = dat,
fixed_effects = ~ blocks,
se_type = "HC1"
),
times = 50
)
## Unit: milliseconds
## expr min lq mean median uq
## base + sandwich 140.859354 142.774669 145.036719 143.610967 144.526066
## lm_robust 28.275240 28.606971 31.999185 28.991653 30.548239
## lm_robust + fes 2.580458 2.769304 4.304403 2.833366 2.917601
## max neval cld
## 209.16015 50 a
## 92.40428 50 b
## 66.59753 50 c