Show that 1 minus the Jaccard similarity is a distance measure between two data objects, x and y, that satisfies the metric axioms given on page 70. Specifically, d(x, y)=1 ? J(x, y).

What will be an ideal response?


1(a). Because J(x, y) ? 1, d(x, y) ? 0.
1(b). Because J(x, x)=1, d(x, x)=0
2. Because J(x, y) = J(y, x), d(x, y) = d(y, x)
3. (Proof due to Jeffrey Ullman)
minhash(x) is the index of first nonzero entry of x

prob(minhash(x) = k) is the probability tha minhash(x) = k when x is ran-
domly permuted.

Note that prob(minhash(x) = minhash(y)) = J(x, y) (minhash lemma)
Therefore, d(x, y)=1?prob(minhash(x) = minhash(y)) = prob(minhash(x) =
minhash(y))
We have to show that,
prob(minhash(x) = minhash(z)) ? prob(minhash(x) = minhash(y)) +
prob(minhash(y) = minhash(z)
However, note that whenever minhash(x) = minhash(z), then at least one of
minhash(x) = minhash(y) and minhash(y) = minhash(z) must be true.

Computer Science & Information Technology

You might also like to view...

Give examples of schedules that would be accepted at

a. SNAPSHOT isolation but not REPEATABLE READ b. SERIALIZABLE but not SNAPSHOT isolation

Computer Science & Information Technology

When the button is highlighted with yellow, it means that the Track Changes feature is turned off

Indicate whether the statement is true or false

Computer Science & Information Technology

To determine the network adapter in a Windows device, you use the Control Panel ________ utility

Fill in the blank(s) with correct word

Computer Science & Information Technology

Computer systems that fall into the _________ category have a single machine instruction that controls the simultaneous execution of a number of processing elements on a lockstep basis.

Fill in the blank(s) with the appropriate word(s).

Computer Science & Information Technology