Consider hash-based evaluation of the projection operator. Assume that all buckets are about the same size but do not ?t in main memory. Let N be the size of the hash table measured in memory pages, F be the size of the original relation measured in pages, and ?<1 be the reduction factor due to projection. Estimate the number of page transfers to and from the disk needed to compute the

projection.

What will be an ideal response?


In the ?rst phase, the original relation is scanned. During the scan, we chop o? the tuple components that are to be projected out, and the rest of the tuple is hashed on the remaining attributes. Whenever a page of the hash table becomes full, it is ?ushed to the corresponding bucket on disk. The cost of this operation is of the order of (1 + ?)F, where F is the number of blocks in the relation.
Clearly, duplicate tuples are always hashed into the same bucket, so we can eliminate duplicates in each bucket separately. This elimination is done in the second phase of the algorithm. Because the individual buckets do not ?t in the main memory, they must be sorted using external sorting. There are N buckets and for each bucket, the cost is: 2 x log(M -1) B /M , where B is the average size of each bucket measured in memory pages, and M is the number of main memory pages available to sort the buckets. Since all buckets are approximately the same in size, B = F /N . The total cost of this step is: 2?N × log(M ?1)|F /(MN )|).

Computer Science & Information Technology

You might also like to view...

Each connection point on a network is referred to as a network ____________________.

Fill in the blank(s) with the appropriate word(s).

Computer Science & Information Technology

The Column Layout feature of ________ reflows the document to fit the size of the device on which you are reading

Fill in the blank(s) with correct word

Computer Science & Information Technology

Compared to the Ribbon, the Font dialog box offers

A) the ability to make several changes at the same time. B) instant previews of how changes will appear in the document. C) a simpler way to change very basic settings. D) the same number of options.

Computer Science & Information Technology

Which of the following WAN technologies allows for the fastest connection?

A. ADSL B. SONET C. T1 D. OC-12

Computer Science & Information Technology