The leader algorithm (Hartigan [4]) represents each cluster using a point, known as a leader, and assigns each point to the cluster corresponding to the closest leader, unless this distance is above a user-specified threshold. In that case, the point becomes the leader of a new cluster. Note that the algorithm described here is not quite the leader algorithm described in Hartigan, which assigns a point to the first leader that is within the threshold distance. The answers apply to the algorithm as stated in the problem.
(a) What are the advantages and disadvantages of the leader algorithm as
compared to K-means?
(b) Suggest ways in which the leader algorithm might be improved.
(a) The leader algorithm requires only a single scan of the data and is thus
more computationally efficient since each object is compared to the
final set of centroids at most once. Although the leader algorithm is
order dependent, for a fixed ordering of the objects, it always produces
the same set of clusters. However, unlike K-means, it is not possible
to set the number of resulting clusters for the leader algorithm, except
indirectly. Also, the K-means algorithm almost always produces better
quality clusters as measured by SSE.
(b) Use a sample to determine the distribution of distances between the
points. The knowledge gained from this process can be used to more
intelligently set the value of the threshold.
The leader algorithm could be modified to cluster for several thresholds
during a single pass.
You might also like to view...
A single array can store ____.
A. multiple values of multiple types B. up to 100 non-object variables C. multiple values of one type D. up to 10 scenes of one program
What should you do before running a delete query?
What will be an ideal response?
Which of the following is NOT true about using your Microsoft account?
A) When an Office application is launched, you can click Sign in to get the most out of Office to login to your Microsoft account. B) Once you log in you are able to access Microsoft cloud services. C) Once you log in to your Microsoft account, your picture and user name display momentarily in the upper left corner of the browser window. D) You should not sign into your Microsoft account if you are using a public computer and have not logged into Windows using a unique username.
___________________ is the first consideration for text formatting.
Fill in the blank(s) with the appropriate word(s).