Discuss why a document-term matrix is an example of a data set that has asymmetric discrete or asymmetric continuous features.
What will be an ideal response?
The ijth entry of a document-term matrix is the number of times that term
j occurs in document i. Most documents contain only a small fraction of
all the possible terms, and thus, zero entries are not very meaningful, either
in describing or comparing documents. Thus, a document-term matrix has
asymmetric discrete features. If we apply a TFIDF normalization to terms
and normalize the documents to have an L2 norm of 1, then this creates a
term-document matrix with continuous features. However, the features are
still asymmetric because these transformations do not create non-zero entries
for any entries that were previously 0, and thus, zero entries are still not very
meaningful.
You might also like to view...
What does the execute permission mean for a directory, a file type for which the execute operation makes no sense?
What will be an ideal response?
Small boxes of text that can be added to a cell are called ________
Fill in the blank(s) with correct word
The Print Preview button gives you more control over what is printed
Indicate whether the statement is true or false
Admission, discharge, and transfer are components of_____
a. Scheduling b. Order entry c. Registration d. Billing