Answer the following questions (and explain how you arrived at your solutions).

Consider the following relations that represent part of a real estate database:
Agent(Id, AgentName)
House(Address, OwnerId, AgentId)
Amenity(Address, Feature)

The Agent relation keeps information on real estate agents, the House relation has information
on who is selling the house and the agent involved, and the Amenity relation provides
information on the features of each house. Each relation has its keys underlined.

Consider the following query:
SELECT H.OwnerId, A.AgentName
FROM House H, Agent A, Amenity Y
WHERE H.Address=Y.Address AND A.Id = H.AgentId
AND Y.Feature = '5BR' AND H.AgentId = '007'

Assume that the buffer space available for this query has 5 pages and that the following statistics and indices are available:

- Amenity:
10,000 records on 1,000 houses, 5 records/page
Clustered 2-level B+ tree index on Address
Unclustered hash index on Feature, 50 features

- Agent:
200 agents with 10 tuples/page
Unclustered hash index on Id

- House:
1,000 houses with 4 records/page
Unclustered hash index on AgentId
Clustered 2-level B+ tree index on Address

(a) Draw a fully pushed query tree corresponding to the above query.
(b) Find the best query plan to evaluate the above query and estimate its cost.
(c) Find the next-best plan and estimate its cost.

(a)

(b) We could join House with Agent or Amenity, but in any case it is clear that we should first select House on AgentId, because of the large reduction in size: There are 200 agents, 1000 houses, so agent 007 must be handling about 5 houses. At 4 records per page, this would occupy 2 pages.

Because the index on AgentId is unclustered, it would take 1.2 I/Os to nd the bucket and 5 I/Os to fetch the relevant pages: 6.2 I/Os in total.

Next we can join with Agent, which would take 1.2 page I/Os to search the index, since Agent has an unclustered index on Id, plus 1 I/O to fetch the page ? 2.2 I/Os in total. This will still result in 5 tuples, but the tuples will be about 50% larger (Agent has 10 tuples/page, while House only 4). However, we can project out Id and AgentId, which will bring the tuple size to about the size of the tuples in House. So, the join will still occupy a little over a page. We keep the result of the join in the main memory buffer.

Finally, we join the result with Amenity. Since the statistics indicate that there are about 10 amenities per house, it doesn't make much sense to select Amenity on Feature: the size will go down by the factor of 10 at a very high price (unclustered hash or scan of 2,000 blocks of the Amenity relation), and we will loose the index on Address ? the attribute used in the join.

So, the best way to do the join is to use index-nested loops join using the clustered index on the attribute Address of Amenity. It would take 2 I/Os to search the B+ tree index for each of the 5 tuples in the result of the previous join (i.e., 2*5; if we cache the top level of the B+ tree then this search would take 2+4=6 I/Os). The number of tuples retrieved would be 50 (10 features per house * 5 houses), which occupies 10 pages. Therefore, the total needed to retrieve the matching tuples in Amenity is 16 I/Os.

Note that we still have enough room in the bu er. The expected size of the join is 50 tuples (5*10), which is too large for our 5 page bu er. However, we can also select on Feature='5BR' on the y, reducing the size to about 5 tuples, each about twice the size of the tuples in House. We can also project )on the y) on OwnerId and AgentName further reducing the size. Thus, we will need 2 pages in the bu er for the result of the join of House and Agent, one page for the input bu er that is needed for reading the matching tuples of Amenity, and two to store the nal result. This ts in the available 5 bu er pages.

In total, thus, the query will take 6.2 + 2.2 + 16 = 24.4 I/Os.

(c) Similar, except that what is left of House is rst joined with Amenity. The number of I/Os is going to be the same, so this is also a best plan.

The next plan after that would be to do something silly, like joining House and Agent using nested loops. Since Agent occupies 20 blocks, this can be done in 20 I/Os. Then we could proceed to join with Amenity.