ISLR Home

Question

p123

In this problem we will investigate the t-statistic for the null hypothesis \(H_0 : \beta = 0\) in simple linear regression without an intercept. To begin, we generate a predictor x and a response y as follows

set.seed(0)
x = rnorm(100)
y = 2 * x + rnorm(100)
  1. Perform a simple linear regression of y onto x, without an intercept. Report the coefficient estimate βˆ, the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis \(H_0 : \beta = 0\). Comment on these results. (You can perform regression without an intercept using the command lm(y∼x+0).)

  2. Now perform a simple linear regression of x onto y without an intercept, and report the coefficient estimate, its standard error, and the corresponding t-statistic and p-values associated with the null hypothesis \(H_0 :\beta =0\). Comment on these results.

  3. What is the relationship between the results obtained in (a) and (b)?

  4. For the regression of Y onto X without an intercept, the t- statistic for \(H_0 :\beta_j =0\) takes the form βˆ/SE(βˆ), where βˆ is given by (3.38), and where

(These formulas are slightly different from those given in Sections 3.1.1 and 3.1.2, since here we are performing regression without an intercept.) Show algebraically, and confirm numerically in R, that the t-statistic can be written as

  1. Using the results from (d), argue that the t-statistic for the regression of y onto x is the same as the t-statistic for the regression of x onto y.

  2. In R, show that when regression is performed with an intercept, the t-statistic for \(H_0 :\beta_1 = 0\) is the same for the regression of y onto x as it is for the regression of x onto y.


library(ISLR)
set.seed(0)
x = rnorm(100)
y = 2 * x + rnorm(100)

11a Fit y onto x without intercept

lm.fit.no.intercept = lm(y ~ 0 + x)
summary(lm.fit.no.intercept)
## 
## Call:
## lm(formula = y ~ 0 + x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6391 -0.8650 -0.2032  0.5898  2.7879 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   2.1374     0.1092   19.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9589 on 99 degrees of freedom
## Multiple R-squared:  0.7948, Adjusted R-squared:  0.7927 
## F-statistic: 383.4 on 1 and 99 DF,  p-value: < 2.2e-16

Since the x is the only predictor, it is highly significant. When x goes up one unit, y goes up two units.

11b Fit x onto y without intercept: x ~ 0 + y

lm.fit.x.y = lm(x ~ 0 + y)
summary(lm.fit.x.y)
## 
## Call:
## lm(formula = x ~ 0 + y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.22971 -0.24830  0.04216  0.34170  0.71230 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y  0.37185    0.01899   19.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4 on 99 degrees of freedom
## Multiple R-squared:  0.7948, Adjusted R-squared:  0.7927 
## F-statistic: 383.4 on 1 and 99 DF,  p-value: < 2.2e-16

It seems that everytime y goes up by about 100 units, x goes up by about 37 units

11c Relationship between (a) and (b)

  1. seems to have a higher standard error and coefficient than (b). However their t-statistic and p-value are the same.

11d TODO

TODO

11e

The derived equation for t-statistic shows that the x values and y values are only being mulitplied. Since multiplication has a cumulative property, switching the values for x and y will not effect the result.

11f t-statistic for both models

Checking the t-statistic for both models and seeing that it is same for both

Model Summary: y ~ 0 + x

summary(lm.fit.no.intercept)
## 
## Call:
## lm(formula = y ~ 0 + x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6391 -0.8650 -0.2032  0.5898  2.7879 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   2.1374     0.1092   19.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9589 on 99 degrees of freedom
## Multiple R-squared:  0.7948, Adjusted R-squared:  0.7927 
## F-statistic: 383.4 on 1 and 99 DF,  p-value: < 2.2e-16

Model Summary: x ~ 0 + y

summary(lm.fit.x.y)
## 
## Call:
## lm(formula = x ~ 0 + y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.22971 -0.24830  0.04216  0.34170  0.71230 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y  0.37185    0.01899   19.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4 on 99 degrees of freedom
## Multiple R-squared:  0.7948, Adjusted R-squared:  0.7927 
## F-statistic: 383.4 on 1 and 99 DF,  p-value: < 2.2e-16