In this exercise you will create some simulated data and will fit simple linear regression models to it. Make sure to use set.seed(1) prior to starting part (a) to ensure consistent results
Setting the seed to keep random values consistent
set.seed(1)
# Creating a vector with values drawn from a random distribution
x = rnorm(100)
Creating a vector with values drawn from a random distribution with mean 0, standard deviation 0.25
eps = rnorm(100, 0, 0.25)
Y = -1 + 0.5*x + eps
plot(x, Y)
There is a linear relationship
lm.fit = lm(Y~x)
summary(lm.fit)
##
## Call:
## lm(formula = Y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46921 -0.15344 -0.03487 0.13485 0.58654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.00942 0.02425 -41.63 <2e-16 ***
## x 0.49973 0.02693 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2407 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
The coefficients for estimated model are off by a few decimals
plot(x, Y)
abline(lm.fit, col='red')
abline(-1, 0.5, col="blue")
legend(-1, c("Predicted", "Actual"), col=1:2, lwd=2)
Fitting a model introducing a quadratic term
lm.poly.fit = lm(Y~x+I(x^2))
summary(lm.poly.fit)
##
## Call:
## lm(formula = Y ~ x + I(x^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4913 -0.1563 -0.0322 0.1451 0.5675
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.98582 0.02941 -33.516 <2e-16 ***
## x 0.50429 0.02700 18.680 <2e-16 ***
## I(x^2) -0.02973 0.02119 -1.403 0.164
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2395 on 97 degrees of freedom
## Multiple R-squared: 0.7828, Adjusted R-squared: 0.7784
## F-statistic: 174.8 on 2 and 97 DF, p-value: < 2.2e-16
Adding the quadratic term does NOT improve the model much (about 0.002). Also quadratic term is not significant according to t-statistic.
Generating data randomly for x from normal distribution
x.less = rnorm(100)
Generating random error terms with small standard deviation (0.05) from normal distribution
eps.less = rnorm(100, 0, 0.05) # epsilon
# Creating the Y value given x and eps
Y.less = -1 + 0.5*x.less + eps.less
lm.less.fit = lm(Y.less ~ x.less)
summary(lm.less.fit)
##
## Call:
## lm(formula = Y.less ~ x.less)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.137090 -0.028070 -0.000874 0.033987 0.092421
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.997578 0.004955 -201.3 <2e-16 ***
## x.less 0.505311 0.004813 105.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04953 on 98 degrees of freedom
## Multiple R-squared: 0.9912, Adjusted R-squared: 0.9911
## F-statistic: 1.102e+04 on 1 and 98 DF, p-value: < 2.2e-16
plot(x.less, Y.less)
# Plotting the least square regression line
abline(lm.less.fit, col='red')
# Plotting the population regression line
abline(-1, 0.5, col="blue")
# Creating a legend for plot
legend(-1, c("Predicted", "Actual"), col=1:2, lwd=2)