p299
The Wage data set contains a number of other features not explored in this chapter, such as marital status (maritl), job class (jobclass), and others. Explore the relationships between some of these other predictors and wage, and use non-linear fitting techniques in order to fit flexible models to the data. Create plots of the results obtained, and write a summary of your findings.
library(tidyverse)
library(ISLR)
attach(Wage)
glimpse(Wage)
## Rows: 3,000
## Columns: 11
## $ year <int> 2006, 2004, 2003, 2003, 2005, 2008, 2009, 2008, 2006, 2004…
## $ age <int> 18, 24, 45, 43, 50, 54, 44, 30, 41, 52, 45, 34, 35, 39, 54…
## $ maritl <fct> 1. Never Married, 1. Never Married, 2. Married, 2. Married…
## $ race <fct> 1. White, 1. White, 1. White, 3. Asian, 1. White, 1. White…
## $ education <fct> 1. < HS Grad, 4. College Grad, 3. Some College, 4. College…
## $ region <fct> 2. Middle Atlantic, 2. Middle Atlantic, 2. Middle Atlantic…
## $ jobclass <fct> 1. Industrial, 2. Information, 1. Industrial, 2. Informati…
## $ health <fct> 1. <=Good, 2. >=Very Good, 1. <=Good, 2. >=Very Good, 1. <…
## $ health_ins <fct> 2. No, 2. No, 1. Yes, 1. Yes, 1. Yes, 1. Yes, 1. Yes, 1. Y…
## $ logwage <dbl> 4.318063, 4.255273, 4.875061, 5.041393, 4.318063, 4.845098…
## $ wage <dbl> 75.04315, 70.47602, 130.98218, 154.68529, 75.04315, 127.11…
library(skimr)
skim(Wage)
Name | Wage |
Number of rows | 3000 |
Number of columns | 11 |
_______________________ | |
Column type frequency: | |
factor | 7 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
maritl | 0 | 1 | FALSE | 5 | 2. : 2074, 1. : 648, 4. : 204, 5. : 55 |
race | 0 | 1 | FALSE | 4 | 1. : 2480, 2. : 293, 3. : 190, 4. : 37 |
education | 0 | 1 | FALSE | 5 | 2. : 971, 4. : 685, 3. : 650, 5. : 426 |
region | 0 | 1 | FALSE | 1 | 2. : 3000, 1. : 0, 3. : 0, 4. : 0 |
jobclass | 0 | 1 | FALSE | 2 | 1. : 1544, 2. : 1456 |
health | 0 | 1 | FALSE | 2 | 2. : 2142, 1. : 858 |
health_ins | 0 | 1 | FALSE | 2 | 1. : 2083, 2. : 917 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1 | 2005.79 | 2.03 | 2003.00 | 2004.00 | 2006.00 | 2008.00 | 2009.00 | ▇▃▃▃▆ |
age | 0 | 1 | 42.41 | 11.54 | 18.00 | 33.75 | 42.00 | 51.00 | 80.00 | ▃▇▇▃▁ |
logwage | 0 | 1 | 4.65 | 0.35 | 3.00 | 4.45 | 4.65 | 4.86 | 5.76 | ▁▁▇▇▁ |
wage | 0 | 1 | 111.70 | 41.73 | 20.09 | 85.38 | 104.92 | 128.68 | 318.34 | ▂▇▂▁▁ |
plot(jobclass,wage)