Apply the classification algorithm to the following set of data records. The class attribute is Repeat Customer.


We start by computing the entropy for the entire set. We have 7 positive
samples and 3 negative samples.
The entropy, I(7,3), is -(7/10 * log (7/10) + 3/10 * log(3/10)) = 0.88

We consider the first attribute AGE. There are 4 values for age
20..30 appears 5 times
I(s11, s21) = -(4/5 * log(4/5) + 1/5 * log(1/5)) = 0.72
31..40 appears 2 times
I(s12, s22) = -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1
41..50 appears 2 times
I(s13, s23) = -(2/2 * log(2/2) = 0
51..60 appears 1 time
I(s14, s24) = -(1/1 * log(1/1) = 0

E(AGE) = 5/10 * 0.72 + 2/10 * 1 + 2/10 * 0 + 1/10 * 0 = 0.56
GAIN(AGE) = 0.88 - 0.56 = 0.32

We consider the second attribute CITY. There are 3 values for city
LA occurs 2 times
I(s11, s21) = -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1
NY occurs 7 times
I(s12, s22) = -(2/7 * log(2/7) + 5/7 * log(5/7)) = 0.86
SF occurs 1 times
I(s13, s23) = -(1/1 * log(1/1) = 0

E(CITY) = 2/10 * 1 + 7/10 * 0.86 + 1/10 * 0 = 0.80
GAIN(CITY) = 0.88 - 0.80 = 0.08

We consider the third attribute GENDER. There are 2 values
F occurs 7 times
I(s11, s21) = -(2/7 * log(2/7) + 5/7 * log(5/7)) = 0.86
M occurs 3 times
I(s12, s22) = -(1/3 * log(1/3) + 2/3 * log(2/3)) = 0.92

E(GENDER) = 0.88
GAIN(GENDER) = 0

We consider the fourth attribute EDUCATION. There are 3 values
HS occurs 2 times
I(s11, s21) = -(2/2 * log(2/2) = 0
COLLEGE occurs 6 times
I(s12, s22) = -(1/6 * log(1/6) + 5/6 * log(5/6)) = 0.65
GRAD occurs 2 times
I(s13, s23) = -(2/2 * log(2/2) = 0

E(EDUCATION) = 0.39
GAIN(EDUCATION) = 0.49


The greatest gain is for the EDUCATION attribute.
The tree at this point would look like the following:

-------------------
| EDUCATION |
-------------------
/ | \
HS / COLLEGE | \ GRAD
/ | \

RIDS: {105,109} {101,103,104, {102,107}
same class: NO 106,108,110} same class: YES

Only the middle node is not a LEAF node, so continue with
those records and consider only the remaining attributes.
The entropy, I(5,1), is -(5/6* log (5/6) + 1/6 * log(1/6)) = 0.65

We consider the first attribute AGE. There are 4 values for age
20..30 appears 3 times
I(s11, s21) = -(3/3 * log(3/3) = 0
31..40 appears 1 time
I(s12, s22) = -(1/1 * log(1/1) = 0
41..50 appears 1 time
I(s13, s23) = -(1/1 * log(1/1) = 0
51..60 appears 1 time
I(s14, s24) = -(1/1 * log(1/1) = 0

E(AGE) = 0
GAIN(AGE) = 0.65

We consider the second attribute CITY. There are 2 values for city
NY occurs 1 time
I(s11, s21) = -(1/1 * log(1/1) = 0
SF occurs 5 times
I(s12, s22) = -(1/5 * log(1/5) + 4/5 * log(4/5)) = 0.72

E(CITY) = 0.60
GAIN(CITY) = 0.05

We consider the third attribute GENDER. There are 2 values
F occurs 5 times
I(s11, s21) = -(1/5 * log(1/5) + 4/5 * log(4/5)) = 0.72
M occurs 1 time
I(s12, s22) = -(1/1 * log(1/1) = 0

E(GENDER) = 0.60
GAIN(GENDER) = 0.05

The greatest gain is for the AGE attribute.
The tree at this point would look like the following
and we are finished.

----------------------
| EDUCATION |
----------------------
/ | \
HS / COLLEGE | \ GRAD
/ | \
----------------
RIDS: {105,109} | AGE | {102,107}
same class: NO ---------------- same class: YES
/ / | \
/ / | \
20..30 / /31..40 |41..50 \ 51..60
{101,108,110} {103} {106} {104}
same class: YES YES YES NO

Computer Science & Information Technology

You might also like to view...

Using JavaScript, create an XML document. The document should have a root element named message that contains the child element myMessage—which contains a text node. Render the document using Internet Explorer. [Hint: Use the HTML innerText property to display the XML. Also use the xml property of DOMDocument object.]

What will be an ideal response?

Computer Science & Information Technology

The _______ event is raised when a mouse button is pressed.

a) MousePress b) MouseClick c) MouseDown d) MouseButtonDown

Computer Science & Information Technology

The background color of a new, blank presentation is white

Indicate whether the statement is true or false

Computer Science & Information Technology

If you have a lot of text that you want to use in a presentation, it is always easier to just type it into PowerPoint, rather than typing it into a word processing program first.

Answer the following statement true (T) or false (F)

Computer Science & Information Technology