Use the K-means algorithm to cluster the data from Exercise 28.20. We can use a value of 3 for K and can assume that the records with RIDs 1, 3, and 5 are used for the initial cluster centroids (means).

What will be an ideal response?


We start by specifying the centroid for each of the 3 clusters.
C1's centroid is (8,4) , i.e., record with rid = 1
C2's centroid is (2,4) , i.e., record with rid = 3
C3's centroid is (2,8) , i.e., record with rid = 5

We now place the remaining records in the cluster whose centroid is closest.

The distance between record 2, i.e., point (5,4), and centroid for C1 is
2 2
SQROOT( |8-5| + |4-4| ) = 3

The distance between record 2 and centroid for C2 is
2 2
SQROOT( |2-5| + |4-4| ) = 3

The distance between record 2 and centroid for C3 is
2 2
SQROOT( |2-5| + |8-4| ) = 5

Record 2 can be placed in either C1 or C2 since the distance from their
respective centroids are the same. Let's choose to place the record
in C1.


The distance between record 4, i.e., point (2,6), and centroid for C1 is
2 2
SQROOT( |8-2| + |4-6| ) = 6.32

The distance between record 4 and centroid for C2 is
2 2
SQROOT( |2-2| + |4-6| ) = 2

The distance between record 4 and centroid for C3 is
2 2
SQROOT( |2-2| + |8-6| ) = 2

Record 4 can be placed in either C2 or C3 since the distance from their
respective centroids are the same. Let's choose to place the record
in C2.


The distance between record 6, i.e., point (8,6), and centroid for C1 is
2 2
SQROOT( |8-8| + |4-6| ) = 2

The distance between record 6 and centroid for C2 is
2 2
SQROOT( |2-8| + |4-6| ) = 6.32

The distance between record 6 and centroid for C3 is
2 2
SQROOT( |2-8| + |8-6| ) = 6.32

Record 6 is closest to centroid of cluster C1 and is placed there.


We now recalculate the cluster centroids:

C1 contains records {1,2,6} with a centroid of
( (8+5+8)/3, (4+4+6)/3) = (7, 4.67)

C2 contains records {3,4} with a centroid of
( (2+2)/2, (4+6)/2) = (2, 5)

C3 contains record {5} with a centroid of (2, 8)

We now make a second iteration over the records, comparing the
distance of each record with the new centroids and possibly
moving records to new clusters. As it turns out, all records
stay in their prior cluster assignment. Since there was no
change, the algorithm terminates.

Computer Science & Information Technology

You might also like to view...

The command show ip protocol is used to do which of the following?

a. Display the routing protocols that can run on the router b. Display the IP address of the routers running an IP protocol c. Display the routing protocols running on the router d. None of these answers is correct.

Computer Science & Information Technology

In general, the decimal, octal, and hexadecimal representations of a given binary number contain (more/fewer) digits than the binary number contains.

What will be an ideal response?

Computer Science & Information Technology

Which of the following kinds of software provides the capabilities of paint software and also includes the ability to enhance and modify existing images and pictures?

A. desktop publishing B. illustration C. image editing D. photo management

Computer Science & Information Technology

If you often sort on a field, its Indexed property should be set to Yes.

Answer the following statement true (T) or false (F)

Computer Science & Information Technology