p123

This question should be answered using the Carseats data set.

Fit a multiple regression model to predict Sales using Price, Urban, and US

Provide an interpretation of each coefficient in the model. Be careful - some of the variables in the model are qualitative!

Write out the model in equation form, being careful to handle the qualitative variables properly

For which of the predictors can you reject the null hypothesis \(H_0 :\beta_j =0\)?

On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome

How well do the models in (a) and (e) fit the data?

Using the model from (e), obtain 95% confidence intervals for the coefficient(s).

Is there evidence of outliers or high leverage observations in the model from (e)?

`library(ISLR)`

`names(Carseats)`

```
## [1] "Sales" "CompPrice" "Income" "Advertising" "Population"
## [6] "Price" "ShelveLoc" "Age" "Education" "Urban"
## [11] "US"
```

`carseat.fit = lm(Sales ~ Price + Urban + US, data=Carseats)`

`summary(carseat.fit)`

```
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
```

`contrasts(Carseats$US)`

```
## Yes
## No 0
## Yes 1
```

USYes coefficient: If the store is in the US (predictor=1), the sales increase at a rate ofthe coefficient is 1.2.

sales = 1.2 * USYes (1.2 from coefficient from model) (ignoring other predictors for simplicity)

Price: Price is highly significant (p-value) when it comes to sales. There is a slight negative correlation to sales. As prices goes up, sales go down.

UrbanYes: Does not have a significant p-value. This means that it does not effect the sales. Consider removing from model

Sales = 13.04 - 0.05 * Price + 1.2 * US 1 if US = Yes ; 0 if US = No

We can reject the null hypothesis for Price and USYes because its p-value is highly significant (<<.05)

```
carseat.fit2 = lm(Sales ~ Price + US, data=Carseats)
summary(carseat.fit2)
```

```
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
```

```
par(mfrow = c(2,2))
plot(carseat.fit2)
```