For the data set with the attributes given below, describe how you would con- vert it into a binary transaction data set appropriate for association analysis. Specifically, indicate for each attribute in the original data set
(a) How many binary attributes it would correspond to in the transaction
data set,
(b) How the values of the original attribute would be mapped to values of
the binary attributes, and
(c) If there is any hierarchical structure in the data values of an attribute
that could be useful for grouping the data into fewer binary attributes.
The following is a list of attributes for the data set along with their possible
values. Assume that all attributes are collected on a per-student basis:
Year : Freshman, Sophomore, Junior, Senior, Graduate:Masters, Grad-
uate:PhD, Professional
Zip code : zip code for the home address of a U.S. student, zip code
for the local address of a non-U.S. student
College : Agriculture, Architecture, Continuing Education, Education,
Liberal Arts, Engineering, Natural Sciences, Business, Law, Medical,
Dentistry, Pharmacy, Nursing, Veterinary Medicine
On Campus : 1 if the student lives on campus, 0 otherwise
Each of the following is a separate attribute that has a value of 1 if the
person speaks the language and a value of 0, otherwise.
– Arabic
– Bengali
– Chinese Mandarin
– English
– Portuguese
– Russian
– Spanish
(a) Each attribute value can be represented using an asymmetric bi-
nary attribute. Therefore, there are altogether 7 binary attributes.
(b) There is a one-to-one mapping between the original attribute values
and the asymmetric binary attributes.
(c) We have a hierarchical structure involving the following high-level
concepts: Undergraduate, Graduate, Professional.
(a) Each attribute value is represented by an asymmetric binary at-
tribute. Therefore, we have as many asymmetric binary attributes
as the number of distinct zipcodes.
(b) There is a one-to-one mapping between the original attribute values
and the asymmetric binary attributes.
(c) We can have a hierarchical structure based on geographical regions
(e.g., zipcodes can be grouped according to their corresponding
states).
(a) Each attribute value is represented by an asymmetric binary at-
tribute. Therefore, we have as many asymmetric binary attributes
as the number of distinct colleges.
(b) There is a one-to-one mapping between the original attribute values
and the asymmetric binary attributes.
(c) We can have a hierarchical structure based on the type of school.
For example, colleges of Medical and Medical might be grouped
together as Medical school while Engineering and Natural Sciences
might be grouped together into the same school.
(a) This attribute can be mapped to one binary attribute.
(b) There is no hierarchical structure.
(a) This attribute can be mapped to one binary attribute.
(b) There is no hierarchical structure.
You might also like to view...
A loop that repeats a specific number of times is known as a(n) __________ loop.
a. count-controlled b. infinite c. conditional d. pretest
_________ are used mostly in applications where timeliness of information is critical and where data are rarely processed exhaustively.
A) ?Indexed files ? B) ?Indexed sequential files C) ?Sequential files D) ?Hashed files
In order to open a blank table in Design View:
A) press Ctrl-N on the Keyboard. B) click on File, and then click on New. C) click on the Table Icon. D) click the Table Design Icon.
Which of the following statements is not true about suggestions displayed by the Performance Analyzer tool?
A) They are identified by a question mark. B) They have potential trade-offs that need to be considered. C) They can be fixed automatically by the tool. D) They carry no risk at all.