Use the similarity matrix in Table 8.1 to perform single and complete link hierarchical clustering. Show your results by drawing a dendrogram. The dendrogram should clearly show the order in which the points are merged.



(b) Do both sets of centroids represent stable solutions; i.e., if the K-means

algorithm was run on this set of points using the given centroids as the

starting centroids, would there be any change in the clusters generated?

(c) What are the two clusters produced by single link?

(d) Which technique, K-means or single link, seems to produce the “most

natural” clustering in this situation? (For K-means, take the clustering

with the lowest squared error.)

(e) What definition(s) of clustering does this natural clustering correspond

to? (Well-separated, center-based, contiguous, or density.)

(f) What well-known characteristic of the K-means algorithm explains the

previous behavior?


The solutions are shown in Figures 8.6(a) and 8.6(b).





total squared error for each set of two clusters. Show both the clusters

and the total squared error for each set of centroids.

i. {18, 45}

First cluster is 6, 12, 18, 24, 30.

Error = 360.

Second cluster is 42, 48.

Error = 18.

Total Error = 378

ii. {15, 40} First cluster is 6, 12, 18, 24 .

Error = 180.

Second cluster is 30, 42, 48.

Error = 168.

Total Error = 348.

(b) Yes, both centroids are stable solutions.

(c) The two clusters are {6, 12, 18, 24, 30} and {42, 48}.

(d) MIN produces the most natural clustering.

(e) MIN produces contiguous clusters. However, density is also an accept-

able answer. Even center-based is acceptable, since one set of centers

gives the desired clusters.

(f) K-means is not good at finding clusters of different sizes, at least when

they are not well separated. The reason for this is that the objective of

m

Computer Science & Information Technology

You might also like to view...

Modify the function you wrote for exercise 20 to descend all subdirectories of the named directory recursively and to find the maximum length of any filename in that hierarchy.

What will be an ideal response?

Computer Science & Information Technology

Cells can be merged vertically and horizontally.

Answer the following statement true (T) or false (F)

Computer Science & Information Technology

What is the major difference between TSL and SSL?

What will be an ideal response?

Computer Science & Information Technology

Design a questionnaire to learn what students think of the registration process at your school. Apply the guidelines you learned in this chapter.

What will be an ideal response?

Computer Science & Information Technology