Given the data sets shown in Figures 5.6, explain how the decision tree, na ??ve Bayes, and k-nearest neighbor classifiers would perform on these data sets.
(a) Both decision tree and NB will do well on this data set because the
distinguishing attributes have better discriminating power than noise
attributes in terms of entropy gain and conditional probability. k-NN
will not do as well due to relatively large number of noise attributes.
(b) NB will not work at all with this data set due to attribute dependency.
Other schemes will do better than NB.
(c) NB will do very well in this data set, because each discriminating at-
tribute has higher conditional probability in one class over the other
and the overall classification is done by multiplying these individual
conditional probabilities. Decision tree will not do as well, due to the
relatively large number of distinguishing attributes. It will have an
overfitting problem. k-NN will do reasonably well.
(d) k-NN will do well on this data set. Decision trees will also work, but
will result in a fairly large decision tree. The first few splits will be quite
random, because it may not find a good initial split at the beginning.
NB will not perform quite as well due to the attribute dependency.
(e) k-NN will do well on this data set. Decision trees will also work, but
will result in a large decision tree. If decision tree uses an oblique split
instead of just vertical and horizontal splits, then the resulting decision
tree will be more compact and highly accurate. NB will not perform
quite as well due to attribute dependency.
(f) kNN works the best. NB does not work well for this data set due to
attribute dependency. Decision tree will have a large tree in order to
capture the circular decision boundaries.
You might also like to view...
An algorithm is a(n)
a. list of general nonspecific steps to produce an output b. logarithm c. systematic method for producing a specified result d. math problem
When using the Future Value and Payment functions, arguments should be expressed in equal terms to guarantee accuracy
Indicate whether the statement is true or false
Filtering by TCP or UDP port number is commonly called port filtering or ____________________ filtering.
Fill in the blank(s) with the appropriate word(s).
You can copy any object, event, or method in the Editor area by dragging and dropping its tile onto the Clipboard icon in the top-right corner of the interface, and then dragging it from the Clipboard icon to a new location.
Answer the following statement true (T) or false (F)