Consider the following query:

```
SELECT DISTINCT E.Ename
FROM Employee E
WHERE E.Title = ’Programmer’ AND E.Dept = ’Production’
```

Assume that
1. 10% of employees are programmers
2. 5% of employees are programmers who work for the production department
3. There are 10 departments
4. The Employee relation has 1000 pages with 10 tuples per page
5. There is a 51-page bu?er that can be used to process the query
Find the best query execution plan for each of the following cases:
a. The only index is on Title, and it is a clustered 2-level B+ tree.
b. The only index is on the attribute sequence Dept, Title, Ename; it is clustered and has two levels.
c. The only index is on Dept, Ename, Title; it is a clustered 3-level B+ tree.
d. There is an unclustered hash index on Dept and a 2-level clustered tree index on Ename.

a. Because the only index is a clustered index on Title, we can use it to ?nd the 1000*0.1=100 pages that contain employees who are programmers. Let us assume that the index is integrated. Then we spend one I/O to get the top page of the B+ tree and then it takes 1 disk access per page to get the right pages. We select on E.Dept = Production on the ?y, which cuts the size down to 50 pages (since 5% of the employees are production programmers). This can ?t in main memory. We can now project on Ename and sort in main memory to eliminate duplicates. Thus, the cost is 1 + 100 = 101 page transfers.

b. Because we have a clustered index on Dept, Title, Ename, we can use an index only strategy. We get the top-level page of the index and from there access the index pages that contain the relevant tuples. There are 50 data pages that contain
the relevant tuples, since 5% of the employees are production programmers. So, the cost of the access is at most 1 + 50 = 51 page I/Os. If the index is integrated, then this is the number of I/Os that is going to happen. However, if the index is not integrated, then the the index pages that contain the relevant tuples are likely to be smaller, and there will be less I/O. Note that since Ename is part of the key, all the production programmers will be sorted on Ename, and we will not need to do it even in main memory.

c. We cannot use the index to search on Title, but we can use it to search on Dept. This will yield 1000/10=100 pages. We can also use an index-only strategy here. Search the index to ?nd all the tuples for the production department and
do projection and selection on the ?y. No sorting is needed, because within each department the tuples are already sorted on Ename. The cost is 2 (for the upper two levels of the index) + the number of index pages that refer to the tuples in the production department. There are 100 pages of the corresponding data tuples, but the index is expected to be much smaller.

d. Searching the hash index on Dept will give us the bucket of pointers to the production employees (about 10% of the total number of tuples, 10,000). But each such pointer will result in a page fetch, since the index in unclustered. So, the cost is 1.2 I/Os to ?nd the bucket plus 1,000 to ?nd the relevant tuples. The above is actually slightly worse than the direct scan, which takes only 1,000 pages. The selection and projection is done on the ?y and the result (50 pages) is
placed in the bu?er, where it is projected and then sorted to eliminate duplicates. Using a clustered index on Ename would require us to take a scan anyway. But since the tuples in the result are already sorted on Ename, we do not need to do any in-memory sorting, so this method is slightly preferable to the other two.

Computer Science & Information Technology

You might also like to view...

List each line in /etc/services that contains the word send. Precede each line by its line number in the file.

What will be an ideal response?

View

Computer Science & Information Technology

The text that appears alongside a JCheckBox is referred to as the _____________.

a) JCheckBox value b) JCheckBox name c) JCheckBox text d) JCheckBox data

View

Computer Science & Information Technology

By creating ________, you can help improve the efficiency of the data entry process because it can display a user-friendly list that is either linked to another field in a related table or a value list

A) lookup fields B) calculated fields C) attachment fields D) hyperlink fields

View

Computer Science & Information Technology

Empowerment makes an IT department less productive because it must spend more time responding to the daily concerns of users and less time on high-impact systems development projects that support strategic business goals.

Answer the following statement true (T) or false (F)

View

Computer Science & Information Technology