Given a set of points in Euclidean space, which are being clustered using the K-means algorithm with Euclidean distance, the triangle inequality can be used in the assignment step to avoid calculating all the distances of each point to each cluster centroid. Provide a general discussion of how this might work.
Charles Elkan presented the following theorem in his keynote speech at the
Workshop on Clustering High-Dimensional Data at SIAM 2004.
Lemma 1:Let x be a point, and let b and c be centers.
If d(b, c) ? 2d(x, b) then d(x, c) ? d(x, b).
Proof:
We know d(b, c) ? d(b, x) + d(x, c).
So d(b, c) ? d(x, b) ? d(x, c).
Now d(b, c) ? d(x, b) ? 2d(x, b) ? d(x, b) = d(x, b).
So d(x, b) ? d(x, c).
This theorem can be used to eliminate a large number of unnecessary distance
calculations.
You might also like to view...
Match the following action button icon descriptions to their names:
I. left-pointing arrow II. right pointing arrow with vertical line III. left pointing arrow with vertical line IV. sheet of paper with folded right-upper corner V. empty box A. End B. Beginning C. Document D. Back or Previous E. Custom
What is the purpose of OS X Remote Disk feature?
A) Used to load startup software on remote networks B) Allows access to a physical disk loaded in another computer C) Used for remote teleconferencing or networking D) Allows one to access a computer from afar
Which of these is a repository for detailed information on virus outbreaks?
a. Computer Emergency Response Team b. F-Secure c. SANS Institute d. Microsoft Security Advisor
Which of the following represents the two fundamental building blocks that protect organizational information?
A. Security and sales B. Human resources and security C. Ethics and security D. Ethics and technology