ISLR Home

# Question

p123

This question should be answered using the Carseats data set.

1. Fit a multiple regression model to predict Sales using Price, Urban, and US

2. Provide an interpretation of each coefficient in the model. Be careful - some of the variables in the model are qualitative!

3. Write out the model in equation form, being careful to handle the qualitative variables properly

4. For which of the predictors can you reject the null hypothesis $$H_0 :\beta_j =0$$?

5. On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome

6. How well do the models in (a) and (e) fit the data?

7. Using the model from (e), obtain 95% confidence intervals for the coefficient(s).

8. Is there evidence of outliers or high leverage observations in the model from (e)?

library(ISLR)

# 10a $$Sales = \beta_0 + \beta_1 Price + \beta_2 Urban + \beta_3 US$$

names(Carseats)
##   "Sales"       "CompPrice"   "Income"      "Advertising" "Population"
##   "Price"       "ShelveLoc"   "Age"         "Education"   "Urban"
##  "US"
carseat.fit = lm(Sales ~ Price + Urban + US, data=Carseats)

# 10b Interpret Coefficients

## Model Summary

summary(carseat.fit)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -6.9206 -1.6220 -0.0564  1.5786  7.0581
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16
contrasts(Carseats\$US)
##     Yes
## No    0
## Yes   1

USYes coefficient: If the store is in the US (predictor=1), the sales increase at a rate ofthe coefficient is 1.2.

sales = 1.2 * USYes (1.2 from coefficient from model) (ignoring other predictors for simplicity)

Price: Price is highly significant (p-value) when it comes to sales. There is a slight negative correlation to sales. As prices goes up, sales go down.

UrbanYes: Does not have a significant p-value. This means that it does not effect the sales. Consider removing from model

# 10c Model with Qualitative Variables

Sales = 13.04 - 0.05 * Price + 1.2 * US 1 if US = Yes ; 0 if US = No

# 10d Reject Predictors/Features

We can reject the null hypothesis for Price and USYes because its p-value is highly significant (<<.05)

# 10e Smaller Model

## Model Summary

carseat.fit2 = lm(Sales ~ Price + US, data=Carseats)
summary(carseat.fit2)
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -6.9269 -1.6286 -0.0574  1.5766  7.0515
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

## Diagostic Plots

par(mfrow = c(2,2))
plot(carseat.fit2)