ISLR Home

Question

p299

The Wage data set contains a number of other features not explored in this chapter, such as marital status (maritl), job class (jobclass), and others. Explore the relationships between some of these other predictors and wage, and use non-linear fitting techniques in order to fit flexible models to the data. Create plots of the results obtained, and write a summary of your findings.


7a

library(tidyverse) 
library(ISLR)
attach(Wage)
glimpse(Wage)
## Rows: 3,000
## Columns: 11
## $ year       <int> 2006, 2004, 2003, 2003, 2005, 2008, 2009, 2008, 2006, 2004…
## $ age        <int> 18, 24, 45, 43, 50, 54, 44, 30, 41, 52, 45, 34, 35, 39, 54…
## $ maritl     <fct> 1. Never Married, 1. Never Married, 2. Married, 2. Married…
## $ race       <fct> 1. White, 1. White, 1. White, 3. Asian, 1. White, 1. White…
## $ education  <fct> 1. < HS Grad, 4. College Grad, 3. Some College, 4. College…
## $ region     <fct> 2. Middle Atlantic, 2. Middle Atlantic, 2. Middle Atlantic…
## $ jobclass   <fct> 1. Industrial, 2. Information, 1. Industrial, 2. Informati…
## $ health     <fct> 1. <=Good, 2. >=Very Good, 1. <=Good, 2. >=Very Good, 1. <…
## $ health_ins <fct> 2. No, 2. No, 1. Yes, 1. Yes, 1. Yes, 1. Yes, 1. Yes, 1. Y…
## $ logwage    <dbl> 4.318063, 4.255273, 4.875061, 5.041393, 4.318063, 4.845098…
## $ wage       <dbl> 75.04315, 70.47602, 130.98218, 154.68529, 75.04315, 127.11…
library(skimr)
skim(Wage)
Data summary
Name Wage
Number of rows 3000
Number of columns 11
_______________________
Column type frequency:
factor 7
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
maritl 0 1 FALSE 5 2. : 2074, 1. : 648, 4. : 204, 5. : 55
race 0 1 FALSE 4 1. : 2480, 2. : 293, 3. : 190, 4. : 37
education 0 1 FALSE 5 2. : 971, 4. : 685, 3. : 650, 5. : 426
region 0 1 FALSE 1 2. : 3000, 1. : 0, 3. : 0, 4. : 0
jobclass 0 1 FALSE 2 1. : 1544, 2. : 1456
health 0 1 FALSE 2 2. : 2142, 1. : 858
health_ins 0 1 FALSE 2 1. : 2083, 2. : 917

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1 2005.79 2.03 2003.00 2004.00 2006.00 2008.00 2009.00 ▇▃▃▃▆
age 0 1 42.41 11.54 18.00 33.75 42.00 51.00 80.00 ▃▇▇▃▁
logwage 0 1 4.65 0.35 3.00 4.45 4.65 4.86 5.76 ▁▁▇▇▁
wage 0 1 111.70 41.73 20.09 85.38 104.92 128.68 318.34 ▂▇▂▁▁

plot(jobclass,wage)