ISLR Home

Question

Consider the USArrests data. We will now perform hierarchical clustering on the states.

  1. Using hierarchical clustering with complete linkage and Euclidean distance, cluster the states.

  2. Cut the dendrogram at a height that results in three distinct clusters. Which states belong to which clusters?

  3. Hierarchically cluster the states using complete linkage and Euclidean distance, after scaling the variables to have standard deviation one.

  4. What effect does scaling the variables have on the hierarchical clustering obtained? In your opinion, should the variables be scaled before the inter-observation dissimilarities are computed? Provide a justification for your answer.


library(ISLR)
library(ggdendro) # Better dendrograms
library(ggplot2)

9a Fitting a hierarchical clustering

hc.complete = hclust(
  dist(USArrests),
  method="complete"
)

Summary

summary(hc.complete)
##             Length Class  Mode     
## merge       98     -none- numeric  
## height      49     -none- numeric  
## order       50     -none- numeric  
## labels      50     -none- character
## method       1     -none- character
## call         3     -none- call     
## dist.method  1     -none- character
hc.complete
## 
## Call:
## hclust(d = dist(USArrests), method = "complete")
## 
## Cluster method   : complete 
## Distance         : euclidean 
## Number of objects: 50

Dendrogram

plot(hc.complete)

Better Dendrogram

ggdendrogram(hc.complete)