ISLR Home

Question

p123

This question should be answered using the Carseats data set.

  1. Fit a multiple regression model to predict Sales using Price, Urban, and US

  2. Provide an interpretation of each coefficient in the model. Be careful - some of the variables in the model are qualitative!

  3. Write out the model in equation form, being careful to handle the qualitative variables properly

  4. For which of the predictors can you reject the null hypothesis \(H_0 :\beta_j =0\)?

  5. On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome

  6. How well do the models in (a) and (e) fit the data?

  7. Using the model from (e), obtain 95% confidence intervals for the coefficient(s).

  8. Is there evidence of outliers or high leverage observations in the model from (e)?


library(ISLR)

10a \(Sales = \beta_0 + \beta_1 Price + \beta_2 Urban + \beta_3 US\)

names(Carseats)
##  [1] "Sales"       "CompPrice"   "Income"      "Advertising" "Population" 
##  [6] "Price"       "ShelveLoc"   "Age"         "Education"   "Urban"      
## [11] "US"
carseat.fit = lm(Sales ~ Price + Urban + US, data=Carseats)

10b Interpret Coefficients

Model Summary

summary(carseat.fit)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16
contrasts(Carseats$US)
##     Yes
## No    0
## Yes   1

USYes coefficient: If the store is in the US (predictor=1), the sales increase at a rate ofthe coefficient is 1.2.

sales = 1.2 * USYes (1.2 from coefficient from model) (ignoring other predictors for simplicity)

Price: Price is highly significant (p-value) when it comes to sales. There is a slight negative correlation to sales. As prices goes up, sales go down.

UrbanYes: Does not have a significant p-value. This means that it does not effect the sales. Consider removing from model

10c Model with Qualitative Variables

Sales = 13.04 - 0.05 * Price + 1.2 * US 1 if US = Yes ; 0 if US = No

10d Reject Predictors/Features

We can reject the null hypothesis for Price and USYes because its p-value is highly significant (<<.05)

10e Smaller Model

Model Summary

carseat.fit2 = lm(Sales ~ Price + US, data=Carseats)
summary(carseat.fit2)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

Diagostic Plots

par(mfrow = c(2,2))
plot(carseat.fit2)

10f Compare Models

The models for both (a) and (e) do NOT fit the data well. The \(R^2\) statistic for both models show that the model ONLY explains 23% of the variance.

10g 95% Confidence Interval

Getting the confidence intervals for each coefficient confint(carseat.fit2)

10h Outliers and High Leverage

Diagnostic Plots

par(mfrow = c(2,2))
plot(carseat.fit2)

Based on the Residuals vs. Leverage graph (bottom right):

  • There is one observation that is far right of the graph. This means that its leverage is really high. Also there are few more that are few more that have high leverage.