RezwanAhmed & His Team || Software Engineer

Web Search Engines Characteristics

Characteristics of Web Information

•“Infinite” size  (Surface vs. deep Web)
–Surface = static HTML pages
–Deep = dynamically generated HTML pages (DB)
–Structured = HTML tags, hyperlinks, etc
–Unstructured = Text
•Different format (pdf, word, ps, …)
•Multi-media (Textual, audio, images, …)
•High variances in quality (Many junks)
•“Universal” coverage (can be about any content)
•Information Access
–Search  (Search engines, e.g. Google)
–Navigation  (Browsers, e.g. IE)
–Filtering (Recommender systems, e.g., Amazon)
•Information Organization
–Categorization (Web directories, e.g., Yahoo!)
–Clustering (Organize search results, e.g., Vivsimo)