Consider a relation r over the attributes A, B, C with the following characteristics:
5,000 tuples with 5 tuples per page
Attribute A is a candidate key
Unclustered hash index on attribute A
Clustered B+ tree index on attribute B
Attribute B has 1,000 distinct values in r
Attribute C has 500 distinct tuples and an unclustered 3-level B+ tree index
Estimate the cost of the following computing:
(a) using the index
(b) using the index
(c) using the index
(d)
(a) Since the hash index is on the candidate key, has at most one tuple. Therefore, the cost is 1.2 (searching the index) + 1(retrieving data). If the index is integrated then the cost is just 1.2.
(b) Since B has 1,000 distinct values in r, there are about 5 tuples per value. Therefore is likely to retrieve 5 tuples. Because the index is clustered and because there are 5 tuples per page, the result its in 1 page.
Therefore, the cost is depth of the tree + 1.
(c) Since C has 500 values, the selection is likely to produce 10 tuples (5000/50). Pointers to these tuples will be in the same or adjacent leaf pages of the B+ tree. We conservatively estimate that these tuples will occupy 2 leaves (index entries are typically much smaller than the data le records. Thus, the cost of retrieving all the pointers is 3 (to search the B+ tree for the rst page of pointers in the index) + 1 (to retrieve the second page of the index) = 4.
Since the index is unclustered, each of the 10 tuples in the result can be in a separate page of the data file and its retrieval may require a separate I/O. Thus, the cost is 4+10 = 14.
(d) Since we do not project out the candidate key, the projection will have the same number of tuples as the original. In particular, there will be no duplicates and no sorting will be required.
The output will be about 2/3 of the original size assuming that all attributes contribute equally to the tuple size. Since the original le has 5,000/5=1000 blocks, the cost of the operation is 1,000(scan of the file) + 2/3*1,000 (cost of writing out the result).
You might also like to view...
________ are very expensive and perform complex mathematical calculations, such as those used in weather forecasting and medical research
A) Mainframes B) Servers C) Clients D) Supercomputers
John measures the voltage coming from the wall outlet. The voltmeter reads 135 VAC. This is _____
A) An undervoltage situation B) An overvoltage situation C) Exactly what he should expect D) A serious problem because it should be DC
What is the return value of "SELECT".substring(4, 4)?
a. an empty string b. C c. T d. E
Before disposal, hard drives should be wiped clean using special software that can erase the data without overwriting the data on the drive.
Answer the following statement true (T) or false (F)