In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining?
What will be an ideal response?
• The Web is too big for effective data mining. The Web is so large and growing so rapidly that it is difficult to even quantify its size. Because of the sheer size of the Web, it is not feasible to set up a data warehouse to replicate, store, and integrate all of the data on the Web, making data collection and integration a challenge.
• The Web is too complex. The complexity of a Web page is far greater than a page in a traditional text document collection. Web pages lack a unified structure. They contain far more authoring style and content variation than any set of books, articles, or other traditional text-based document.
• The Web is too dynamic. The Web is a highly dynamic information source. Not only does the Web grow rapidly, but its content is constantly being updated. Blogs, news stories, stock market results, weather reports, sports scores, prices, company advertisements, and numerous other types of information are updated regularly on the Web.
• The Web is not specific to a domain. The Web serves a broad diversity of communities and connects billions of workstations. Web users have very different backgrounds, interests, and usage purposes. Most users may not have good knowledge of the structure of the information network and may not be aware of the heavy cost of a particular search that they perform.
• The Web has everything. Only a small portion of the information on the Web is truly relevant or useful to someone (or some task). Finding the portion of the Web that is truly relevant to a person and the task being performed is a prominent issue in Web-related research.
You might also like to view...
A large national trucking company takes great care to remain nonunion by carefully selecting managers who display a participative leadership style, putting in place a nonunion grievance procedure, and paying wages and benefits at the top of the market range. This company is practicing:
A. Union suppression. B. Authoritarian management. C. Paternalistic management. D. Union substitution.
Which of the following IS NOT part of a red ocean strategy?
a. beating the competition b. exploiting existing demand c. creating uncontested market space d. making the value/cost trade-off
Countries that have high individualism and relatively low power distance:
A. prefer others to do things for themselves. B. are not upset when others have more power than they do. C. are upset when others have more power than they do. D. are collectivist in their approach.
Which is not a popular search engine?
A) Google B) Linux C) AltaVista D) Lycos