Design a query evaluation algorithm for the CUBE operator that uses the results of the previously computed aggregations to compute newaggregations. (Hint: Organize the GROUP BY clauses used in the computation of a data cube into a lattice, that is, a partial order with the least upper bound and the greatest lower bound for each pair of elements. Here is an example of such a partial order: GROUP BY A > GROUP BY A,B > GROUP BY A,B,C and GROUP BY B > GROUP BY B,C > GROUP BY A,B,C. Describe howaggregates computed for the lower parts of the lattice can be used in the computation of the upper parts.)

What will be an ideal response?

The idea is basically the same as in the case of ROLLUP: to compute



SELECT A1,..., An aggr(B)

FROM SomeRelation

WHERE ...

GROUP BY A1,..., An

where aggr is min, max, sum, or count, we can first compute the relation
SomeRelation1,...,n,n+1



SELECT A1,..., An, An+1, aggr(B)

FROM SomeRelation

WHERE ...

GROUP BY A1,..., An, An+1

and then compute



SELECT A1,..., An, aggr(B)

FROM SomeRelation1,...,n,n+1

WHERE ...

GROUP BY A1,..., An

In this way, we are reusing the previous aggregation of aggr(B) over the attributes
of SomeRelation minus A1,..., An, An+1, B and nowonly need to aggregate the
values of B in SomeRelation1,...,n,n+1 over the attribute An+1.
If aggr is avg, we need to compute sum and count using the above optimization
and then divide one by the other.
We can thus start computing



GROUP BY CUBE (A1,..., An)

with an aggregation that groups by all the attributes, A1,..., An, then move to
the next layer in the lattice, where we aggregate over all attributes but one etc. For
instance, the second level of aggregation will perform the following groupings:



GROUP BY A2, A3, ..., An

GROUP BY A1, A3,..., An

... ... ...

GROUP BY A1, A2, ..., An?1

Thus, the second layer will have N aggregations. The first GROUP BY above aggregates B in SomeRelation1,...,n over the attribute A1, the second aggregates it over A2, etc.
The third layer will have N (N ? 1)/2! aggregations, etc. The important point is that at layer 2 aggregation is done over the relation computed at layer 1, SomeRelation1,...,n,
and at layer 3 we aggregate over the appropriate relations computed at layer 2. We
continue in this way until we reach the top layer where we have N aggregations, each over a single attribute.