-
Notifications
You must be signed in to change notification settings - Fork 271
/
08-ch8.Rmd
1762 lines (1264 loc) · 79.2 KB
/
08-ch8.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Nonlinear Regression Functions {#nrf}
Until now we assumed the regression function to be linear, i.e., we have treated the slope parameter of the regression function as a constant. This implies that the effect on $Y$ of a one unit change in $X$ does not depend on the level of $X$. If, however, the effect of a change in $X$ on $Y$ does depend on the value of $X$, we should use a nonlinear regression function.
Just like for the previous chapter, the packages `r ttcode("AER")` [@R-AER] and `r ttcode("stargazer")` [@R-stargazer] are required for reproduction of the code presented in this chapter. Check whether the code chunk below executes without any error messages.
```{r, warning=FALSE, message=FALSE, eval=FALSE}
library(AER)
library(stargazer)
```
## A General Strategy for Modelling Nonlinear Regression Functions
Let us have a look at an example where using a nonlinear regression function is better suited for estimating the population relationship between the regressor, $X$, and the regressand, $Y$: the relationship between the income of schooling districts and their test scores.
```{r dataset, results='hide', echo=TRUE, message=FALSE}
# prepare the data
library(AER)
data(CASchools)
CASchools$size <- CASchools$students/CASchools$teachers
CASchools$score <- (CASchools$read + CASchools$math) / 2
```
We start our analysis by computing the correlation between both variables.
```{r}
cor(CASchools$income, CASchools$score)
```
Here, income and test scores are positively related: school districts with above average income tend to achieve above average test scores. Does a linear regression function model the data adequately? Let us plot the data and add a linear regression line.
```{r fig.align = 'center'}
# fit a simple linear model
linear_model<- lm(score ~ income, data = CASchools)
# plot the observations
plot(CASchools$income, CASchools$score,
col = "steelblue",
pch = 20,
xlab = "District Income (thousands of dollars)",
ylab = "Test Score",
cex.main = 0.9,
main = "Test Score vs. District Income and a Linear OLS Regression Function")
# add the regression line to the plot
abline(linear_model,
col = "red",
lwd = 2)
legend("bottomright", legend="linear fit",lwd=2,col="red")
```
As pointed out in the book, the linear regression line seems to overestimate the true relationship when income is very high or very low and underestimates it for the middle income group.
Fortunately, OLS does not only handle linear functions of the regressors. We can, for example, model test scores as a function of income and the square of income. The corresponding regression model is
$$TestScore_i = \beta_0 + \beta_1 \times income_i + \beta_2 \times income_i^2 + u_i,$$ called a *quadratic regression model*. That is, $income^2$ is treated as an additional explanatory variable. Hence, the quadratic model is a special case of a multivariate regression model. When fitting the model with `r ttcode("lm()")` we have to use the `r ttcode("^")` operator in conjunction with the function `r ttcode("I()")` to add the quadratic term as an additional regressor to the argument `r ttcode("formula")`. This is because the regression formula we pass to `r ttcode("formula")` is converted to an object of the class `r ttcode("formula")`. For objects of this class, the operators `r ttcode("+")`, `r ttcode("-")`, `r ttcode("*")` and `r ttcode("^")` have a nonarithmetic interpretation. `r ttcode("I()")` ensures that they are used as arithmetical operators, see `r ttcode("?I")`,
```{r}
# fit the quadratic Model
quadratic_model <- lm(score ~ income + I(income^2), data = CASchools)
# obtain the model summary
coeftest(quadratic_model, vcov. = vcovHC, type = "HC1")
```
The output tells us that the estimated regression function is
$$\widehat{TestScore}_i = \underset{(2.90)}{607.3} + \underset{(0.27)}{3.85} \times income_i - \underset{(0.0048)}{0.0423} \times income_i^2.$$
This model allows us to test the hypothesis that the relationship between test scores and district income is linear against the alternative that it is quadratic. This corresponds to testing
$$H_0: \beta_2 = 0 \ \ \text{vs.} \ \ H_1: \beta_2\neq0,$$
since $\beta_2=0$ corresponds to a simple linear equation and $\beta_2\neq0$ implies a quadratic relationship. We find that $t=(\hat\beta_2 - 0)/SE(\hat\beta_2) = -0.0423/0.0048 = -8.81$ so the null is rejected at any common level of significance and we conclude that the relationship is nonlinear. This is consistent with the impression gained from the plot.
We now draw the same scatter plot as for the linear model and add the regression line for the quadratic model. Because `r ttcode("abline()")` can only draw straight lines, it cannot be used here. `r ttcode("lines()")` is a function which allows to draw nonstraight lines, see `?lines`. The most basic call of `r ttcode("lines()")` is `r ttcode("lines(x_values, y_values)")` where `r ttcode("x_values")` and `r ttcode("y_values")` are vectors of the same length that provide coordinates of the points to be *sequentially* connected by a line. This makes it necessary to sort the coordinate pairs according to the X-values. Here we use the function `r ttcode("order()")` to sort the fitted values of `r ttcode("score")` according to the observations of `r ttcode("income")`.
```{r, fig.align="center"}
# draw a scatterplot of the observations for income and test score
plot(CASchools$income, CASchools$score,
col = "steelblue",
pch = 20,
xlab = "District Income (thousands of dollars)",
ylab = "Test Score",
main = "Estimated Linear and Quadratic Regression Functions")
# add a linear function to the plot
abline(linear_model, col = "green", lwd = 2)
# add quatratic function to the plot
order_id <- order(CASchools$income)
lines(x = CASchools$income[order_id],
y = fitted(quadratic_model)[order_id],
col = "red",
lwd = 2)
legend("bottomright",legend=c("Linear Line","Quadratic Line"),
lwd=2,col=c("green","red"))
```
We see that the quadratic function does fit the data much better than the linear function.
## Nonlinear Functions of a Single Independent Variable {#nfoasiv}
### Polynomials {-}
The approach used to obtain a quadratic model can be generalized to polynomial models of arbitrary degree $r$,
$$Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \cdots + \beta_r X_i^r + u_i.$$
A cubic model for instance can be estimated in the same way as the quadratic model; we just have to use a polynomial of degree $r=3$ in `r ttcode("income")`. This is conveniently done using the function `r ttcode("poly()")`.
```{r cubic}
# estimate a cubic model
cubic_model <- lm(score ~ poly(income, degree = 3, raw = TRUE), data = CASchools)
```
`r ttcode("poly()")` generates orthogonal polynomials which are orthogonal to the constant by default. Here, we set `r ttcode("raw = TRUE")` such that raw polynomials are evaluated, see `?poly`.
In practice the question will arise which polynomial order should be chosen. First, similarly as for $r=2$, we can test the null hypothesis that the true relation is linear against the alternative hypothesis that the relationship is a polynomial of degree $r$:
$$ H_0: \beta_2=0, \ \beta_3=0,\dots,\beta_r=0 \ \ \ \text{vs.} \ \ \ H_1: \text{at least one} \ \beta_j\neq0, \ j=2,\dots,r. $$
This is a joint null hypothesis with $r-1$ restrictions so it can be tested using the $F$-test presented in Chapter \@ref(htaciimr). `r ttcode("linearHypothesis()")` can be used to conduct such tests. For example, we may test the null of a linear model against the alternative of a polynomial of a maximal degree $r=3$ as follows.
```{r, warning=F, message=F}
# test the hypothesis of a linear model against quadratic or polynomial
# alternatives
# set up hypothesis matrix
R <- rbind(c(0, 0, 1, 0),
c(0, 0, 0, 1))
# do the test
linearHypothesis(cubic_model,
hypothesis.matrix = R,
white.adj = "hc1")
```
We provide a hypothesis matrix as the argument `r ttcode("hypothesis.matrix")`. This is useful when the coefficients have long names, as is the case here due to using `r ttcode("poly()")`, or when the restrictions include multiple coefficients. How the hypothesis matrix $\mathbf{R}$ is interpreted by `r ttcode("linearHypothesis()")` is best seen using matrix algebra:
For the two linear constraints above, we have
\begin{align*}
\mathbf{R}\boldsymbol{\beta} =& \mathbf{s} \\
\\
\begin{pmatrix}
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{pmatrix}
\begin{pmatrix}
\beta_0 \\
\beta_1 \\
\beta_2 \\
\beta_3 \\
\end{pmatrix} =&
\begin{pmatrix}
0 \\
0
\end{pmatrix} =>
\begin{pmatrix}
\beta_2 \\
\beta_3
\end{pmatrix} =
\begin{pmatrix}
0 \\
0
\end{pmatrix}.
\end{align*}
`r ttcode("linearHypothesis()")` uses the zero vector for $\mathbf{s}$ by default, see `?linearHypothesis`.
The $p$-value for the test is very small so that we reject the null hypothesis. However, this does not tell us *which* $r$ to choose. In practice, one approach to determine the degree of the polynomial is to use *sequential testing*:
1. Estimate a polynomial model for some maximum value $r$.
2. Use a $t$-test to test $\beta_r = 0$. *Rejection* of the null means that $X^r$ belongs in the regression equation.
3. *Acceptance* of the null in step 2 means that $X^r$ can be eliminated from the model. Continue by repeating step 1 with order $r-1$ and test whether $\beta_{r-1}=0$. If the test rejects, use a polynomial model of order $r-1$.
4. If the test from step 3 rejects, continue with the procedure until the coefficient on the highest power is statistically significant.
There is no unambiguous guideline on how to choose $r$ in step one. However, as pointed out in @stock2015, economic data is often smooth such that it is appropriate to choose small orders like $2$, $3$, or $4$.
We will demonstrate how to apply sequential testing by the example of the cubic model.
```{r}
summary(cubic_model)
```
The estimated cubic model stored in `r ttcode("cubic_model")` is
$$ \widehat{TestScore}_i = \underset{(5.83)}{600.1} + \underset{(0.86)}{5.02} \times income -\underset{(0.037)}{0.096} \times income^2 - \underset{(0.00047)}{0.00069} \times income^3. $$
The $t$-statistic on $income^3$ is $1.45$ so the null that the relationship is quadratic cannot be rejected, even at the $10\%$ level. This is contrary to the result presented in the book which reports robust standard errors throughout, so we will also use robust variance-covariance estimation to reproduce these results.
```{r}
# test the hypothesis using robust standard errors
coeftest(cubic_model, vcov. = vcovHC, type = "HC1")
```
The reported standard errors have changed. Furthermore, the coefficient for `income^3` is now significant at the $5\%$ level. This means we reject the hypothesis that the regression function is quadratic against the alternative that it is cubic. Furthermore, we can also test if the coefficients for `r ttcode("income^2")` and `r ttcode("income^3")` are jointly significant using a robust version of the $F$-test.
```{r f-test}
# perform robust F-test
linearHypothesis(cubic_model,
hypothesis.matrix = R,
vcov. = vcovHC, type = "HC1")
```
With a $p$-value of $8.945e-13$, i.e., much less than $0.05$, the null hypothesis of linearity is rejected in favor of the alternative that the relationship is quadratic or cubic.
#### Interpretation of Coefficients in Nonlinear Regression Models {-}
The coefficients in the polynomial regressions do not have a simple interpretation. Why? Think of a quadratic model: it is not helpful to think of the coefficient on $X$ as the expected change in $Y$ associated with a change in $X$ holding the other regressors constant because $X^2$ changes as $X$ varies. This is also the case for other deviations from linearity, for example in models where regressors and/or the dependent variable are log-transformed. A way to approach this is to calculate the estimated effect on $Y$ associated with a change in $X$ for one or more values of $X$. This idea is summarized in Key Concept 8.1.
```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC8.1">
<h3 class = "right"> Key Concept 8.1 </h3>
<h3 class = "left"> The Expected Effect on $Y$ for a Change in $X_1$ in a Nonlinear Regression Model </h3>
Consider the nonlinear population regression model
$$ Y_i = f(X_{1i}, X_{2i}, \\dots, X_{ki}) + u_i \\ , \\ i=1,\\dots,n,$$
where $f(X_{1i}, X_{2i}, \\dots, X_{ki})$ is the population regression function and $u_i$ is the error term.
Denote by $\\Delta Y$ the expected change in $Y$ associated with $\\Delta X_1$, the change in $X_1$ while holding $X_2, \\cdots , X_k$ constant. That is, the expected change in $Y$ is the difference
$$\\Delta Y = f(X_1 + \\Delta X_1, X_2, \\cdots, X_k) - f(X_1, X_2, \\cdots, X_k).$$
The estimator of this unknown population difference is the difference between the predicted values for these two cases. Let $\\hat{f}(X_1, X_2, \\cdots, X_k)$ be the predicted value of of $Y$ based on the estimator $\\hat{f}$ of the population regression function. Then the predicted change in $Y$ is
$$\\Delta \\widehat{Y} = \\hat{f}(X_1 + \\Delta X_1, X_2, \\cdots, X_k) - \\hat{f}(X_1, X_2, \\cdots, X_k).$$
</p>
</div>
')
```
```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[The Expected Effect on $Y$ for a Change in $X_1$ in a Nonlinear Regression Model]{8.1}
Consider the nonlinear population regression model
$$ Y_i = f(X_{1i}, X_{2i}, \\dots, X_{ki}) + u_i \\ , \\ i=1,\\dots,n,$$
where $f(X_{1i}, X_{2i}, \\dots, X_{ki})$ is the population regression function and $u_i$ is the error term.
Denote by $\\Delta Y$ the expected change in $Y$ associated with $\\Delta X_1$, the change in $X_1$ while holding $X_2, \\cdots , X_k$ constant. That is, the expected change in $Y$ is the difference
$$\\Delta Y = f(X_1 + \\Delta X_1, X_2, \\cdots, X_k) - f(X_1, X_2, \\cdots, X_k).$$
The estimator of this unknown population difference is the difference between the predicted values for these two cases. Let $\\hat{f}(X_1, X_2, \\cdots, X_k)$ be the predicted value of of $Y$ based on the estimator $\\hat{f}$ of the population regression function. Then the predicted change in $Y$ is
$$\\Delta \\widehat{Y} = \\hat{f}(X_1 + \\Delta X_1, X_2, \\cdots, X_k) - \\hat{f}(X_1, X_2, \\cdots, X_k).$$
\\end{keyconcepts}
')
```
For example, we may ask the following: what is the predicted change in test scores associated with a one unit change (i.e., $\$1000$) in income, based on the estimated quadratic regression function
$$\widehat{TestScore} = 607.3 + 3.85 \times income - 0.0423 \times income^2\ ?$$
Because the regression function is quadratic, this effect depends on the *initial* district income. We therefore consider two cases:
1. An increase in district income form $10$ to $11$ (from $\$10000$ per capita to $\$11000$).
2. An increase in district income from $40$ to $41$ (that is from $\$40000$ to $\$41000$).
In order to obtain the $\Delta \widehat{Y}$ associated with a change in income form $10$ to $11$, we use the following formula:
$$\Delta \widehat{Y} = \left(\hat{\beta}_0 + \hat{\beta}_1 \times 11 + \hat{\beta}_2 \times 11^2\right) - \left(\hat{\beta}_0 + \hat{\beta}_1 \times 10 + \hat{\beta}_2 \times 10^2\right). $$
To compute $\widehat{Y}$ using `r ttcode("R")` we may use `r ttcode("predict()")`.
```{r quadratic}
# compute and assign the quadratic model
quadratic_model <- lm(score ~ income + I(income^2), data = CASchools)
# set up data for prediction
new_data <- data.frame(income = c(10, 11))
# do the prediction
Y_hat <- predict(quadratic_model, newdata = new_data)
# compute the difference
diff(Y_hat)
```
Analogously we can compute the effect of a change in district income from $40$ to $41$:
```{r}
# set up data for prediction
new_data <- data.frame(income = c(40, 41))
# do the prediction
Y_hat <- predict(quadratic_model, newdata = new_data)
# compute the difference
diff(Y_hat)
```
So for the quadratic model, the expected change in $TestScore$ induced by an increase in $income$ from $10$ to $11$ is about $2.96$ points but an increase in $income$ from $40$ to $41$ increases the predicted score by only $0.42$. Hence, the slope of the estimated quadratic regression function is *steeper* at low levels of income than at higher levels.
### Logarithms {-}
Another way to specify a nonlinear regression function is to use the natural logarithm of $Y$ and/or $X$.
Logarithms convert changes in variables into percentage changes. This is convenient as many relationships are naturally expressed in terms of percentages.
There are three different cases in which logarithms might be used.
1. Transform $X$ to its logarithm, but not $Y$.
2. Analogously we can transform $Y$ to its logarithm but leave $X$ at level.
3. Both $Y$ and $X$ are transformed to their logarithms.
The interpretation of the regression coefficients is different in each case.
#### Case I: $X$ is in Logarithm, $Y$ is not. {-}
The regression model then is $$Y_i = \beta_0 + \beta_1 \times \ln(X_i) + u_i \text{, } i=1,...,n. $$ Similar as for polynomial regressions, we do not have to create a new variable before using `r ttcode("lm()")`. We can simply adjust the
`r ttcode("formula")` argument of `r ttcode("lm()")` to tell `r ttcode("R")` that the log-transformation of a variable should be used.
```{r}
# estimate a level-log model
LinearLog_model <- lm(score ~ log(income), data = CASchools)
# compute robust summary
coeftest(LinearLog_model,
vcov = vcovHC, type = "HC1")
```
Hence, the estimated regression function is
$$\widehat{TestScore} = 557.8 + 36.42 \times \ln(income).$$
Let us draw a plot of this function.
```{r, fig.align="center"}
# draw a scatterplot
plot(score ~ income,
col = "steelblue",
pch = 20,
data = CASchools,
ylab="Score",
xlab="Income",
main = "Linear-Log Regression Line")
# add the linear-log regression line
order_id <- order(CASchools$income)
lines(CASchools$income[order_id],
fitted(LinearLog_model)[order_id],
col = "red",
lwd = 2)
legend("bottomright",legend = "Linear-log line",lwd = 2,col ="red")
```
We can interpret $\hat{\beta}_1$ as follows: a $1\%$ increase in income is associated with an increase in test scores of $0.01 \times 36.42 = 0.36$ points. In order to get the estimated effect of a one unit change in income (that is, a change in the original units, thousands of dollars) on test scores, the method presented in Key Concept 8.1 can be used.
```{r}
# set up new data
new_data <- data.frame(income = c(10, 11, 40, 41))
# predict the outcomes
Y_hat <- predict(LinearLog_model, newdata = new_data)
# compute the expected difference
Y_hat_matrix <- matrix(Y_hat, nrow = 2, byrow = TRUE)
Y_hat_matrix[, 2] - Y_hat_matrix[, 1]
```
By setting `r ttcode("nrow = 2")` and `r ttcode("byrow = TRUE")` in `r ttcode("matrix()")`, we ensure that `r ttcode("Y_hat_matrix")` is a $2\times2$ matrix filled row-wise with the entries of `r ttcode("Y_hat")`.
The estimated model states that for an income increase from $\$10000$ to $\$11000$, test scores increase by an expected amount of $3.47$ points. When income increases from $\$40000$ to $\$41000$, the expected increase in test scores is only about $0.90$ points.
#### Case II: $Y$ is in Logarithm, $X$ is not {-}
There are cases where it is useful to regress $\ln(Y)$.
The corresponding regression model then is
$$ \ln(Y_i) = \beta_0 + \beta_1 \times X_i + u_i , \ \ i=1,...,n. $$
```{r}
# estimate a log-linear model
LogLinear_model <- lm(log(score) ~ income, data = CASchools)
# obtain a robust coefficient summary
coeftest(LogLinear_model,
vcov = vcovHC, type = "HC1")
```
The estimated regression function is $$\widehat{\ln(TestScore)} = 6.439 + 0.00284 \times income.$$ An increase in district income by $\$1000$ is expected to increase test scores by $100\times 0.00284 \% = 0.284\%$.
When the dependent variable in logarithm, one cannot simply use $e^{\log(\cdot)}$ to transform predictions back to the original scale, see page of the book.
#### Case III: $X$ and $Y$ are in Logarithms {-}
The log-log regression model is $$\ln(Y_i) = \beta_0 + \beta_1 \times \ln(X_i) + u_i, \ \ i=1,...,n.$$
```{r log log}
# estimate the log-log model
LogLog_model <- lm(log(score) ~ log(income), data = CASchools)
# print robust coefficient summary to the console
coeftest(LogLog_model,
vcov = vcovHC, type = "HC1")
```
The estimated regression function hence is $$\widehat{\ln(TestScore)} = 6.336 + 0.0554 \times \ln(income).$$ In a log-log model, a $1\%$ change in $X$ is associated with an estimated $\hat\beta_1 \%$ change in $Y$.
We now reproduce Figure 8.5 of the book.
```{r, fig.align="center"}
# generate a scatterplot
plot(log(score) ~ income,
col = "steelblue",
pch = 20,
data = CASchools,
ylab="log(Score)",
xlab="Income",
main = "Log-Linear Regression Function")
# add the log-linear regression line
order_id <- order(CASchools$income)
lines(CASchools$income[order_id],
fitted(LogLinear_model)[order_id],
col = "red",
lwd = 2)
# add the log-log regression line
lines(sort(CASchools$income),
fitted(LogLog_model)[order(CASchools$income)],
col = "green",
lwd = 2)
# add a legend
legend("bottomright",
legend = c("log-linear model", "log-log model"),
lwd = 2,
col = c("red", "green"))
```
Key Concept 8.2 summarizes the three logarithmic regression models.
```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC8.2">
<h3 class = "right"> Key Concept 8.2 </h3>
<h3 class = "left"> Logarithms in Regression: Three Cases </h3>
<p> Logarithms can be used to transform the dependent variable $Y$ or the independent variable $X$, or both
(the variable being transformed must be positive). The following table summarizes these three cases and the interpretation of the regression coefficient $\\beta_1$. In each case, $\\beta_1$, can be estimated by applying OLS after taking the logarithm(s) of the dependent and/or the independent variable. </p>
<table>
<thead>
<tr class="header">
<th align="left">Case</th>
<th align="left">Model Specification</th>
<th align="left">Interpretation of $\\beta_1$</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">$(I)$</td>
<td align="left">$Y_i = \\beta_0 + \\beta_1 \\ln(X_i) + u_i$</td>
<td align="left">A $1 \\%$ change in $X$ is associated with a change in $Y$ of $0.01 \\times \\beta_1$.</td>
</tr>
<tr class="even">
<td align="left">$(II)$</td>
<td align="left">$\\ln(Y_i) = \\beta_0 + \\beta_1 X_i + u_i$</td>
<td align="left">A change in $X$ by one unit ($\\Delta X = 1$) is associated with a $100 \\times \\beta_1 \\%$ change in $Y$.</td>
</tr>
<tr class="odd">
<td align="left">$(III)$</td>
<td align="left">$\\ln(Y_i) = \\beta_0 + \\beta_1 \\ln(X_i) + u_i$</td>
<td align="left">A $1\\%$ change in $X$ is associated with a $\\beta_1\\%$ change in $Y$, so $\\beta_1$ is the elasticity of $Y$ with respect to $X$.</td>
</tr>
</tbody>
</table>
</div>
')
```
```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Logarithms in Regression: Three Cases]{8.2}
\\begin{tabularx}{\\textwidth}{llX}
\\textbf{Case} & \\textbf{Model Specification} & \\textbf{Interpretation of $\\beta_1$} \\\\
$(I)$ & $Y_i = \\beta_0 + \\beta_1 \\ln(X_i) + u_i$ & A $1 \\%$ change in $X$ is associated with a change in $Y$ of \\newline $0.01 \\times \\beta_1$. \\\\
$(II)$ & $\\ln(Y_i) = \\beta_0 + \\beta_1 X_i + u_i$ & A change in $X$ by one unit ($\\Delta X = 1$) is associated with a \\newline $100 \\times \\beta_1 \\%$ change in $Y$. \\\\
$(III)$ & $\\ln(Y_i) = \\beta_0 + \\beta_1 \\ln(X_i) + u_i$ & A $1\\%$ change in $X$ is associated with a $\\beta_1 \\%$ change in $Y$, so \\newline $\\beta_1$ is the elasticity of $Y$ with respect to $X$. \\\\
\\end{tabularx}
\\end{keyconcepts}
')
```
Of course we can also estimate a *polylog* model like
$$ TestScore_i = \beta_0 + \beta_1 \times \ln(income_i) + \beta_2 \times \ln(income_i)^2 + \beta_3 \times \ln(income_i)^3 + u_i, $$
which models the dependent variable $TestScore$ by a third-degree polynomial of the log-transformed regressor $income$.
```{r poly log}
# estimate the polylog model
polyLog_model <- lm(score ~ log(income) + I(log(income)^2) + I(log(income)^3),
data = CASchools)
# print robust summary to the console
coeftest(polyLog_model,
vcov = vcovHC, type = "HC1")
```
Comparing by $\bar{R}^2$ we find that, leaving out the log-linear model, all models have a similar adjusted fit. In the class of polynomial models, the cubic specification has the highest $\bar{R}^2$ whereas the linear-log specification is the best of the log-models.
```{r}
# compute the adj. R^2 for the nonlinear models
adj_R2 <-rbind("quadratic" = summary(quadratic_model)$adj.r.squared,
"cubic" = summary(cubic_model)$adj.r.squared,
"LinearLog" = summary(LinearLog_model)$adj.r.squared,
"LogLinear" = summary(LogLinear_model)$adj.r.squared,
"LogLog" = summary(LogLog_model)$adj.r.squared,
"polyLog" = summary(polyLog_model)$adj.r.squared)
# assign column names
colnames(adj_R2) <- "adj_R2"
adj_R2
```
Let us now compare the cubic and the linear-log model by plotting the corresponding estimated regression functions.
```{r, fig.align='center'}
# generate a scatterplot
plot(score ~ income,
data = CASchools,
col = "steelblue",
pch = 20,
ylab="Score",
xlab="Income",
main = "Linear-Log and Cubic Regression Functions")
# add the linear-log regression line
order_id <- order(CASchools$income)
lines(CASchools$income[order_id],
fitted(LinearLog_model)[order_id],
col = "darkgreen",
lwd = 2)
# add the cubic regression line
lines(x = CASchools$income[order_id],
y = fitted(cubic_model)[order_id],
col = "red",
lwd = 2)
# add a legend
legend("bottomright",
legend = c("Linear-Log model", "Cubic model"),
lwd = 2,
col = c("darkgreen", "red"))
```
Both regression lines look nearly identical. Altogether the linear-log model may be preferable since it is more parsimonious in terms of regressors: it does not include higher-degree polynomials.
## Interactions between Independent Variables
There are research questions where it is interesting to learn how the effect on $Y$ of a change in an independent variable depends on the value of another independent variable. For example, we may ask if districts with many English learners benefit differentially from a decrease in class sizes to those with few English learning students. To assess this using a multiple regression model, we include an interaction term. We consider three cases:
1. Interactions between two binary variables.
2. Interactions between a binary and a continuous variable.
3. Interactions between two continuous variables.
The following subsections discuss these cases briefly and demonstrate how to perform such regressions in `r ttcode("R")`.
#### Interactions Between Two Binary Variables {-}
Take two binary variables $D_1$ and $D_2$ and the population regression model
$$ Y_i = \beta_0 + \beta_1 \times D_{1i} + \beta_2 \times D_{2i} + u_i. $$
Now assume that
\begin{align*}
Y_i=& \, \ln(Earnings_i),\\
D_{1i} =& \,
\begin{cases}
1 & \text{if $i^{th}$ person has a college degree,} \\
0 & \text{else}.
\end{cases} \\
D_{2i} =& \,
\begin{cases}
1 & \text{if $i^{th}$ person is female,} \\
0 & \text{if $i^{th}$ person is male}.
\end{cases}
\end{align*}
We know that $\beta_1$ measures the average difference in $\ln(Earnings)$ between individuals with and without a college degree and $\beta_2$ is the gender differential in $\ln(Earnings)$, ceteris paribus. This model *does not* allow us to determine if there is a gender specific effect of having a college degree and, if so, *how strong* this effect is. It is easy to come up with a model specification that allows to investigate this:
$$ Y_i = \beta_0 + \beta_1 \times D_{1i} + \beta_2 \times D_{2i} + \beta_3 \times (D_{1i} \times D_{2i}) + u_i, $$
$(D_{1i} \times D_{2i})$ is called an interaction term and $\beta_3$ measures the difference in the effect of having a college degree for women versus men.
```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC8.3">
<h3 class = "right"> Key Concept 8.3 </h3>
<h3 class = "left"> A Method for Interpreting Coefficients in Regression with Binary Variables </h3>
Compute expected values of $Y$ for each possible set described by the set of binary variables. Compare the expected values.
The coefficients can be expressed either as expected values or as the difference between at least two expected values.
</div>
')
```
```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[A Method for Interpreting Coefficients in Regression with Binary Variables]{8.3}
Compute expected values of $Y$ for each possible set described by the set of binary variables. Compare the expected values.
The coefficients can be expressed either as expected values or as the difference between at least two expected values.
\\end{keyconcepts}
')
```
Following Key Concept 8.3 we have
\begin{align*}
E(Y_i\vert D_{1i}=0, D_{2i} = d_2) =& \, \beta_0 + \beta_1 \times 0 + \beta_2 \times d_2 + \beta_3 \times (0 \times d_2) \\
=& \, \beta_0 + \beta_2 \times d_2.
\end{align*}
If $D_{1i}$ switches from $0$ to $1$ we obtain
\begin{align*}
E(Y_i\vert D_{1i}=1, D_{2i} = d_2) =& \, \beta_0 + \beta_1 \times 1 + \beta_2 \times d_2 + \beta_3 \times (1 \times d_2) \\
=& \, \beta_0 + \beta_1 + \beta_2 \times d_2 + \beta_3 \times d_2.
\end{align*}
Hence, the overall effect is
$$ E(Y_i\vert D_{1i}=1, D_{2i} = d_2) - E(Y_i\vert D_{1i}=0, D_{2i} = d_2) = \beta_1 + \beta_3 \times d_2, $$
so the effect is a difference of expected values.
#### Application to the Student-Teacher Ratio and the Percentage of English Learners {-}
Now let
\begin{align*}
HiSTR =& \,
\begin{cases}
1, & \text{if $STR \geq 20$} \\
0, & \text{else},
\end{cases} \\
\\
HiEL =& \,
\begin{cases}
1, & \text{if $PctEL \geq 10$} \\
0, & \text{else}.
\end{cases}
\end{align*}
We may use `r ttcode("R")` to construct the variables above as follows.
```{r}
# append HiSTR to CASchools
CASchools$HiSTR <- as.numeric(CASchools$size >= 20)
# append HiEL to CASchools
CASchools$HiEL <- as.numeric(CASchools$english >= 10)
```
We proceed by estimating the model
\begin{align}
TestScore = \beta_0 + \beta_1 \times HiSTR + \beta_2 \times HiEL + \beta_3 \times (HiSTR \times HiEL) + u_i. (\#eq:im)
\end{align}
There are several ways to add the interaction term to the `r ttcode("formula")` argument when using `r ttcode("lm()")` but the most intuitive way is to use `r ttcode("HiEL * HiSTR")`.^[Appending `r ttcode("HiEL * HiSTR")` to the formula will add `r ttcode("HiEL")`, `r ttcode("HiSTR")` *and* their interaction as regressors while `r ttcode("HiEL:HiSTR")` only adds the interaction term.]
```{r}
# estimate the model with a binary interaction term
bi_model <- lm(score ~ HiSTR * HiEL, data = CASchools)
# print a robust summary of the coefficients
coeftest(bi_model, vcov. = vcovHC, type = "HC1")
```
The estimated regression model is $$\widehat{TestScore} = \underset{(1.39)}{664.1} - \underset{(1.93)}{1.9} \times HiSTR - \underset{(2.33)}{18.3} \times HiEL - \underset{(3.12)}{3.3} \times (HiSTR \times HiEL),$$ and it predicts that the effect of moving from a school district with a low student-teacher ratio to a district with a high student-teacher ratio, depending on high or low percentage of english learners is $-1.9-3.3\times HiEL$. So for districts with a low share of english learners ($HiEL = 0$), the estimated effect is a decrease of $1.9$ points in test scores while for districts with a large fraction of English learners ($HiEL = 1$), the predicted decrease in test scores amounts to $1.9 + 3.3 = 5.2$ points.
We can also use the model to estimate the mean test score for each possible combination of the included binary variables.
```{r}
# estimate means for all combinations of HiSTR and HiEL
# 1.
predict(bi_model, newdata = data.frame("HiSTR" = 0, "HiEL" = 0))
# 2.
predict(bi_model, newdata = data.frame("HiSTR" = 0, "HiEL" = 1))
# 3.
predict(bi_model, newdata = data.frame("HiSTR" = 1, "HiEL" = 0))
# 4.
predict(bi_model, newdata = data.frame("HiSTR" = 1, "HiEL" = 1))
```
We now verify that these predictions are differences in the coefficient estimates presented in equation \@ref(eq:im):
\begin{align*}
\widehat{TestScore} = \hat\beta_0 = 664.1 \quad &\Leftrightarrow \quad HiSTR = 0, \, HIEL = 0.\\
\widehat{TestScore} = \hat\beta_0 + \hat\beta_2 = 664.1 - 18.3 = 645.8 \quad &\Leftrightarrow \quad HiSTR = 0, \, HIEL = 1.\\
\widehat{TestScore} = \hat\beta_0 + \hat\beta_1 = 664.1 - 1.9 = 662.2 \quad &\Leftrightarrow \quad HiSTR = 1, \, HIEL = 0.\\
\widehat{TestScore} = \hat\beta_0 + \hat\beta_1 + \hat\beta_2 + \hat\beta_3 = 664.1 - 1.9 - 18.3 - 3.3 = 640.6 \quad &\Leftrightarrow \quad HiSTR = 1, \, HIEL = 1.
\end{align*}
#### Interactions Between a Continuous and a Binary Variable {-}
Now let $X_i$ denote the years of working experience of person $i$, which is a continuous variable. We have
\begin{align*}
Y_i =& \, \ln(Earnings_i), \\
\\
X_i =& \, \text{working experience of person }i, \\
\\
D_i =& \,
\begin{cases}
1, & \text{if $i^{th}$ person has a college degree} \\
0, & \text{else}.
\end{cases}
\end{align*}
The baseline model thus is
$$ Y_i = \beta_0 + \beta_1 X_i + \beta_2 D_i + u_i, $$
a multiple regression model that allows us to estimate the average benefit of having a college degree holding working experience constant as well as the average effect on earnings of a change in working experience holding college degree constant.
By adding the interaction term $X_i \times D_i$ we allow the effect of an additional year of work experience to differ between individuals with and without college degree,
$$ Y_i = \beta_0 + \beta_1 X_i + \beta_2 D_i + \beta_3 (X_i \times D_i) + u_i. $$
Here, $\beta_3$ is the expected difference in the effect of an additional year of work experience for college graduates versus non-graduates. Another possible specification is
$$ Y_i = \beta_0 + \beta_1 X_i + \beta_2 (X_i \times D_i) + u_i. $$
This model states that the expected impact of an additional year of work experience on earnings differs for college graduates and non-graduates but that graduating on its own does not increase earnings.
All three regression functions can be visualized by straight lines. Key Concept 8.4 summarizes the differences.
```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC8.4">
<h3 class = "right"> Key Concept 8.4 </h3>
<h3 class = "left"> Interactions Between Binary and Continuous Variables </h3>
An interaction term like $X_i \\times D_i$ (where $X_i$ is continuous and $D_i$ is binary) allows for the slope to depend on the binary
variable $D_i$. There are three possibilities:
1. Different intercept and same slope:
$$ Y_i = \\beta_0 + \\beta_1 X_i + \\beta_2 D_i + u_i . $$
2. Different intercept and different slope:
$$ Y_i = \\beta_0 + \\beta_1 X_i + \\beta_2 D_i + \\beta_3 \\times (X_i \\times D_i) + u_i .$$
3. Same intercept and different slope:
$$ Y_i = \\beta_0 + \\beta_1 X_i + \\beta_2 (X_i \\times D_i) + u_i . $$
</div>
')
```
```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Interactions Between Binary and Continuous Variables]{8.4}
An interaction term like $X_i \\times D_i$ (where $X_i$ is continuous and $D_i$ is binary) allows for the slope to depend on the binary
variable $D_i$. There are three possibilities:\\newline
\\begin{enumerate}
\\item Different intercept and same slope:
$$ Y_i = \\beta_0 + \\beta_1 X_i + \\beta_2 D_i + u_i .$$
\\item Different intercept and different slope:
$$ Y_i = \\beta_0 + \\beta_1 X_i + \\beta_2 D_i + \\beta_3 \\times (X_i \\times D_i) + u_i .$$
\\item Same intercept and different slope:
$$ Y_i = \\beta_0 + \\beta_1 X_i + \\beta_2 (X_i \\times D_i) + u_i .$$
\\end{enumerate}
\\end{keyconcepts}
')
```
The following code chunk demonstrates how to replicate the results shown in Figure 8.8 of the book using artificial data.
```{r, fig.align='center',fig.width=8, fig.height=8}
# generate artificial data
set.seed(1)
X <- runif(200,0, 15)
D <- sample(0:1, 200, replace = T)
Y <- 450 + 150 * X + 500 * D + 50 * (X * D) + rnorm(200, sd = 300)
# divide plotting area accordingly
m <- rbind(c(1, 2), c(3, 0))
graphics::layout(m)
# estimate the models and plot the regression lines
# 1. (baseline model)
plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Different Intercepts, Same Slope",
cex.main=1.2)
mod1_coef <- lm(log(Y) ~ X + D)$coefficients
abline(coef = c(mod1_coef[1], mod1_coef[2]),
col = "red",
lwd = 1.5)
abline(coef = c(mod1_coef[1] + mod1_coef[3], mod1_coef[2]),
col = "green",
lwd = 1.5)
# 2. (baseline model + interaction term)
plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Different Intercepts, Different Slopes",
cex.main=1.2)
mod2_coef <- lm(log(Y) ~ X + D + X:D)$coefficients
abline(coef = c(mod2_coef[1], mod2_coef[2]),
col = "red",
lwd = 1.5)
abline(coef = c(mod2_coef[1] + mod2_coef[3], mod2_coef[2] + mod2_coef[4]),
col = "green",
lwd = 1.5)
# 3. (omission of D as regressor + interaction term)
plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Same Intercept, Different Slopes",
cex.main=1.2)
mod3_coef <- lm(log(Y) ~ X + X:D)$coefficients
abline(coef = c(mod3_coef[1], mod3_coef[2]),
col = "red",
lwd = 2)
abline(coef = c(mod3_coef[1], mod3_coef[2] + mod3_coef[3]),
col = "green",
lwd = 2)
```
#### Application to the Student-Teacher Ratio and the Percentage of English Learners {-}
Using a model specification like the second one discussed in Key Concept 8.3 (different slope, different intercept) we may answer the question whether the effect on test scores of decreasing the student-teacher ratio depends on whether there are many or few English learners. We estimate the regression model
$$ \widehat{TestScore_i} = \beta_0 + \beta_1 \times size_i + \beta_2 \times HiEL_i + \beta_2 (size_i \times HiEL_i) + u_i. $$
```{r}
# estimate the model
bci_model <- lm(score ~ size + HiEL + size * HiEL, data = CASchools)
# print robust summary of coefficients to the console
coeftest(bci_model, vcov. = vcovHC, type = "HC1")
```
The estimated regression model is $$ \widehat{TestScore} = \underset{(11.87)}{682.2} - \underset{(0.59)}{0.97} \times size + \underset{(19.51)}{5.6} \times HiEL - \underset{(0.97)}{1.28} \times (size \times HiEL). $$ The estimated regression line for districts with a low fraction of English learners ($HiEL_i=0$) is $$ \widehat{TestScore} = 682.2 - 0.97\times size_i. $$
For districts with a high fraction of English learners we have
\begin{align*}
\widehat{TestScore} =& \, 682.2 + 5.6 - 0.97\times size_i - 1.28 \times size_i \\
=& \, 687.8 - 2.25 \times size_i.
\end{align*}
The predicted increase in test scores following a reduction of the student-teacher ratio by $1$ unit is about $0.97$ points in districts where the fraction of English learners is low but $2.25$ in districts with a high share of English learners. From the coefficient on the interaction term $size \times HiEL$ we see that the difference between both effects is $1.28$ points.
The next code chunk draws both lines belonging to the model. In order to make observations with $HiEL = 0$ distinguishable from those with $HiEL = 1$, we use different colors.
```{r, fig.align = 'center'}
# identify observations with PctEL >= 10
id <- CASchools$english >= 10
# plot observations with HiEL = 0 as red dots
plot(CASchools$size[!id], CASchools$score[!id],
xlim = c(0, 27),
ylim = c(600, 720),
pch = 20,
col = "red",
main = "",
xlab = "Class Size",
ylab = "Test Score")
# plot observations with HiEL = 1 as green dots
points(CASchools$size[id], CASchools$score[id],
pch = 20,
col = "green")
# read out estimated coefficients of bci_model
coefs <- bci_model$coefficients
# draw the estimated regression line for HiEL = 0
abline(coef = c(coefs[1], coefs[2]),
col = "red",
lwd = 1.5)
# draw the estimated regression line for HiEL = 1
abline(coef = c(coefs[1] + coefs[3], coefs[2] + coefs[4]),
col = "green",
lwd = 1.5 )
# add a legend to the plot
legend("topleft",
pch = c(20, 20),
col = c("red", "green"),
legend = c("HiEL = 0", "HiEL = 1"))
```
#### Interactions Between Two Continuous Variables {-}
Consider a regression model with $Y$ the log earnings and two continuous regressors $X_1$, the years of work experience, and $X_2$, the years of schooling. We want to estimate the effect on wages of an additional year of work experience depending on a given level of schooling. This effect can be assessed by including the interaction term $(X_{1i} \times X_{2i})$ in the model:
$$ \Delta Y_i = \beta_0 + \beta_1 \times X_{1i} + \beta_2 \times X_{2i} + \beta_3 \times (X_{1i} \times X_{2i}) + u_i. $$
Following Key Concept 8.1, we find that the effect on $Y$ for a change on $X_1$ given $X_2$ is
$$ \frac{\Delta Y}{\Delta X_1} = \beta_1 + \beta_3 X_2. $$
In the earnings example, a positive $\beta_3$ implies that the effect on log earnings of an additional year of work experience grows linearly with years of schooling. In reverse, we have $$ \frac{\Delta Y}{\Delta X_2} = \beta_2 + \beta_3 X_1 $$ as the effect on log earnings of an additional year of schooling holding work experience constant.
Altogether we find that $\beta_3$ measures the effect of a unit increase in $X_1$ and $X_2$ *beyond* the effects of increasing $X_1$ alone and $X_2$ alone by one unit. The overall change in $Y$ is thus
\begin{align}
Y_i = (\beta_1 + \beta_3 X_2) \Delta X_1 + (\beta_2 + \beta_3 X_1) \Delta X_2 + \beta_3\Delta X_1 \Delta X_2. (\#eq:generalinteraction)
\end{align}
Key Concept 8.5 summarizes interactions between two regressors in multiple regression.
```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC8.5">
<h3 class = "right"> Key Concept 8.5 </h3>
<h3 class = "left"> Interactions in Multiple Regression </h3>
The interaction term between the two regressors $X_1$ and $X_2$ is given by their product $X_1 \\times X_2$. Adding this
interaction term as a regressor to the model $$ Y_i = \\beta_0 + \\beta_1 X_1 + \\beta_2 X_2 + u_i $$ allows the effect on $Y$ of a change in $X_2$ to depend on the value of $X_1$ and vice versa. Thus the coefficient $\\beta_3$ in the model $$ Y_i = \\beta_0 + \\beta_1 X_1 + \\beta_2 X_2 + \\beta_3 (X_1 \\times X_2) + u_i $$ measures the effect of a one-unit increase in both $X_1$ <it>and</it> $X_2$ above and beyond the sum of both individual effects. This holds for continuous *and* binary regressors.
</div>
')
```
```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Interactions in Multiple Regression]{8.5}
The interaction term between the two regressors $X_1$ and $X_2$ is given by their product $X_1 \\times X_2$. Adding this
interaction term as a regressor to the model $$ Y_i = \\beta_0 + \\beta_1 X_1 + \\beta_2 X_2 + u_i $$ allows the effect on $Y$ of a change in $X_2$ to depend on the value of $X_1$ and vice versa. Thus the coefficient $\\beta_3$ in the model $$ Y_i = \\beta_0 + \\beta_1 X_1 + \\beta_2 X_2 + \\beta_3 (X_1 \\times X_2) + u_i $$ measures the effect of a one-unit increase in both $X_1$ \\textit{and} $X_2$ above and beyond the sum of both individual effects. This holds for continuous \\textit{and} binary regressors.
\\end{keyconcepts}
')
```
#### Application to the Student-Teacher Ratio and the Percentage of English Learners
We now examine the interaction between the continuous variables student-teacher ratio and the percentage of English learners.
```{r}
# estimate regression model including the interaction between 'PctEL' and 'size'
cci_model <- lm(score ~ size + english + english * size, data = CASchools)
# print a summary to the console
coeftest(cci_model, vcov. = vcovHC, type = "HC1")
```
The estimated model equation is $$ \widehat{TestScore} = \underset{(11.76)}{686.3} - \underset{(0.59)}{1.12} \times STR - \underset{(0.37)}{0.67} \times PctEL + \underset{(0.02)}{0.0012} \times (STR\times PctEL). $$
For the interpretation, let us consider the quartiles of $PctEL$.
```{r}
summary(CASchools$english)
```
According to \@ref(eq:generalinteraction), if $PctEL$ is at its median value of $8.78$, the slope of the regression function relating test scores and the student teacher ratio is predicted to be $-1.12 + 0.0012 \times 8.78 = -1.11$. This means that increasing the student-teacher ratio by one unit is expected to deteriorate test scores by $1.11$ points. For the $75\%$ quantile, the estimated change on $TestScore$ of a one-unit increase in $STR$ is estimated by $-1.12 + 0.0012 \times 23.0 = -1.09$ so the slope is somewhat lower. The interpretation is that for a school district with a share of $23\%$ English learners, a reduction of the student-teacher ratio by one unit is expected to increase test scores by only $1.09$ points.
However, the output of `r ttcode("coeftest()")` indicates that the difference of the effect for the median and the $75\%$ quantile is not statistically significant. $H_0: \beta_3 = 0$ cannot be rejected at the $5\%$ level of significance (the $p$-value is $0.95$).
#### Example: The Demand for Economic Journals {-}