Friday, 27 March 2020

More on Venn Diagrams for Regression (Summary)


More on Venn Diagrams for Regression
Volume 10 | Number 1 | May 2002 p.1-10                                                     Peter E. Kennedy
Journal of Statistics Education  
                               
What (abstract)
The main contribution of this the paper consists of suggestions for how this approach (of using Venn diagram) can be used effectively in expositing results relating to bias and variance of coefficient estimates in multiple regression analysis. Previous works IP (2001) have been limited to the R2, partial correlation, and sums of squares in the presence of suppressor variables. This article presents a different interpretation of Venn diagrams, highlighting illustrations of bias and variance.

Methodology/Model/Data-

Regression with single explanatory variable X
Y= variation in Y
From the main article
X= variation in X
Purple= variation in common (βx)
Black= Error term (σ2). The magnitude of this area represents
the magnitude of the OLS estimate of  σ2, the variance
of the error term





Regression with more than one explanatory variable
From the main article
If regress Yon X alone βx=
Blue+Red
If regress Yon W alone βy= Green+Red
If regress Yon X and W together-
1-     βx= Blue+Red βy= Green+Red or
2-     βx= Blue βy= Green or
3-     Divide Red into two parts or any other way to
calculate βx and βy.
The best case is not using the red part but only using
Blue+Yellow and Orange+Blue to represent y and X respectively. The red area shows the joint variation of X and W together which may result in biased estimates.

Here Yellow area represents the magnitude of σ2, the variance of the error term. OLS uses the magnitude of the area that can’t be explained to estimate σ2.

Multicollinearity
 
From the main article
Collinearity is captured by increasing the overlap b/w the X and W circles.
Y estimates- unbiased as in both the figures Blue and The green part is used.
However, it has caused an increase in the variance as the size of Blue and Green the area is shrunk.


Omitting a Relevant Explanatory Variable
Generated by author
Suppose W is emitted. The estimation is biased as both Blue and Red areas are used but variance decreases.
If X and W are orthogonal that is X and W do not overlap, the results remain unbiased and variance is unaffected. We may remove the W variable if it's highly collinear.


Detrending Data
W is a time trend. How will it affect if removed? Remove it. Regress detrended Y on detrended X. Also X and W are not orthogonal. According to the data used…
Reg y on X, obtain Bx and variance vb*.
Reg X onW, save residual r, reg y on r to get c*, est. r coeff., and est. var vc*.
Reg y on W, save the residual s, regress s on r to get d*, est. r coeff., est. var vd*.

Coeff.
Est.
Est. var
b*
1.129427
0.00210754 vb*
c*
1.129427
0.00987857 vc*
d*
1.129427
0.00208904vd*

From the main article
b= usual OLS estimate
r= Orange+Blue (X cant be explained by W)
s= Blue+Yellow (y cant be explained by W)
s+r overlap= Blue
 reg s on r (Blue+Yellow on Orange +Blue)= uses the same info as for esti. b* and c*
But the variances vb*, vc* and vd* respectively are different.

Why?
Although the true variances are equal but the estimated are
 not. vb* and vd* are also nearly equal. These are the variations not explained and calculated by the magnitude of the Yellow area. Let us now come to vc* as it is high comparatively. It is because the variation not explained in y is the Yellow+Red+Green areas making variance σ2 overestimated. That’s why it is greater than vb* and vd*.

Conclusion
The main contribution of this paper is to drawing some effective ways of using Venn Diagram when teaching regression analysis. Also, there are cases where Ballentine (Venn diagram) can mislead in the OLS but for Standard Analysis, it is highly recommended by Kennedy himself.





1 comment:

More on Venn Diagrams for Regression (Summary)

More on Venn Diagrams for Regression Volume 10 | Number 1 | May 2002 p.1-10                                                     Pete...