Estimate the number of page transfers needed to compute r  s using a sort-merge join, assuming the following: A=B

1. The size of r is 1000 pages, 10 tuples per page; the size of s is 500 pages, 20 tuples per page.
2. The size of the main memory bu?er for this join computation is 10 pages.
3. The Cartesian product of matching tuples in r and s (see Figure 10.7) is computed using a block-nested loops join.
4. r.A has 100 distinct values and s.B has 50 distinct values. These values are spread around the ?les more or less evenly, so the size of ?A=c(r), where c r.A, does not vary much with c.


1. External sort of relation r
2 × 1, 000(log9|1, 000|) = 8, 000 pages

2. External sort of relation s
2 × 500(log9|500|) = 3, 000 pages

3. The merge step
First, clearly, we need to scan both r and s, which costs us 1,500 page transfers.
For each value of the attribute A, the number of pages of r that contain tuples that match that value is 1,000/100=10. Similarly, for each value of B, the number of pages of s that contain tuples that match that value is 500/50=10. Thus, each time we have a match, we compute a Cartesian product of two 10-page relations. Using block-nested loops with 10 bu?er pages requires that we scan the r-part once and the s-part twice for a total of 30 pages. However, one scan of r and s needed to compute the produce is accounted for in the aforesaid initial scan (the one that takes 1,500 page transfers). So, the extra cost is only 10 pages needed to scan the s-part the second time.
The output has 10 10 = 100 pages, so for each match computing the Cartesian product takes 110 page transfers. The question is how many matches we can have. Since A has 100 values and B has 50, we can have at most 50 matches, so computing and outputting the joined tuples will thus take at most 50 × 110 = 5, 500 page transfers. Together we get:
1, 000 + 500 + 5, 500 = 7, 000 pages

4. Thus, the total cost is
6, 000 + 2, 000 + 7, 000 = 15, 000 pages

Computer Science & Information Technology

You might also like to view...

A(n) ____________ is a heap in which the value of each node is greater than or equal to the values of its children (if it has any children)

A. binary heap B. maxheap C. ordered heap D. complete heap

Computer Science & Information Technology

Before linking to a spreadsheet, you must first:

A) make sure the data is organized. B) make sure there are no blank fields. C) make sure the names of the tables are the same. D) all the above are true

Computer Science & Information Technology

Addition and this mathematical operator are considered to be on the same precedence level

A) Subtraction B) Division C) Multiplication

Computer Science & Information Technology

Linux offers memory protection based on the type of information stored in each region belonging to the address space of a user request.

Answer the following statement true (T) or false (F)

Computer Science & Information Technology