Discuss why a document-term matrix is an example of a data set that has asymmetric discrete or asymmetric continuous features.

What will be an ideal response?


The ijth entry of a document-term matrix is the number of times that term
j occurs in document i. Most documents contain only a small fraction of
all the possible terms, and thus, zero entries are not very meaningful, either
in describing or comparing documents. Thus, a document-term matrix has
asymmetric discrete features. If we apply a TFIDF normalization to terms
and normalize the documents to have an L2 norm of 1, then this creates a
term-document matrix with continuous features. However, the features are
still asymmetric because these transformations do not create non-zero entries
for any entries that were previously 0, and thus, zero entries are still not very
meaningful.

Computer Science & Information Technology

You might also like to view...

What does the execute permission mean for a directory, a file type for which the execute operation makes no sense?

What will be an ideal response?

Computer Science & Information Technology

Small boxes of text that can be added to a cell are called ________

Fill in the blank(s) with correct word

Computer Science & Information Technology

The Print Preview button gives you more control over what is printed

Indicate whether the statement is true or false

Computer Science & Information Technology

Admission, discharge, and transfer are components of_____

a. Scheduling b. Order entry c. Registration d. Billing

Computer Science & Information Technology