# Introduction

The goal is to introduce linear regression in R by solving the Kaggle Ames Housing competition.

Donâ€™t expect a great score. Thereâ€™s a lot more to learn but this blog will take you from zero to submission.

Simple linear regression only uses one variable/predictor/feature to make a prediction. In our case, the feature is the ground living area: GrLivArea. We chose this parameter by reading this document.

# Load a couple libraries

``````library(tidyverse) # A lot of magic in here
library(GGally)``````

# Read the training data

``train <- read_csv("../input/train.csv")``
# Read test data

``test <- read_csv("../input/test.csv")``
# Keep only a subset of the data

``````train <- select(train, c("Id", "GrLivArea", "LotArea", "TotalBsmtSF", "YearBuilt", "SalePrice")) # Tidyverse
test <- select(test, c("Id", "GrLivArea", "LotArea", "TotalBsmtSF", "YearBuilt")) # Tidyverse``````

# Correlation Plot

``ggpairs(train, binwidth=30)``
