Pages in a database have P bytes. A table T has R rows and n integer attributes, A1, A2, .An. Each attribute is r bytes long. Assume that r is much greater than the size of a pointer, so the pointer size can be ignored when computing the size of an entry in an index. Assume that the domain of A1 is the integers between 1 and 100 and that the values of this attribute are randomly distributed in
the domain. Consider the following statement:
SELECT A2
FROM T
WHERE A1 <= 50 AND A1 > 25
Evaluate the cost, in terms of the number of I/O operations, of executing the statement under the following conditions. In order to get credit you must explain your work clearly.
(a) The table is stored as a heap (no indexes).
(b) There is a non-integrated (separate index file) clustered index with search key A1.
(c) There is an unclustered index with search key (A2, A1).
(a) The table is stored as a heap (no indexes).
Solution:
scan entire table
size of table = n ? r ? R
number pages in table = (n ? r ? R)/P = cost
(b) There is a non-integrated (separate index file) clustered index with search key A1.
Solution:
Search down the tree using A1 = 25, follow pointer to first qualifying row in data file, scan data file and return all rows until row with A1>50 is found.
fan-out of a page in the index = P/r (or P/2r for a worst case analysis) num. of levels in tree = logP/r R = cost of tree search
cost of file scan = .25((n r R)/P) since approximately a quarter of the rows have to be
returned (values of A1 are randomly distributed) total cost = logP/r R + 1 + .25((n ? r ? R)/P)
(c) There is an unclustered index with search key (A2, A1).
Solution:
Scan leaf level of tree (index covering). An index entry has 2r bytes, hence the (mini- mum) number of leaf level pages in the index = R/(P/2r) (worst case analysis doubles the cost).
You might also like to view...
The maximum speed of a 10BaseFL network is __________________.
Fill in the blank(s) with the appropriate word(s).
A surge is a short transient in the voltage that can be due to a short circuit or power outage
Indicate whether the statement is true or false
Find the mean of the set of data. 2, 4, 6, 5, 2, 4, 9, 4, 8, 12
A. 5.1 B. 4.5 C. 5.6 D. 10 E. 4
Why would a manager prefer a decision tree instead of a decision table?
What will be an ideal response?