Academic Journals Database
Disseminating quality controlled scientific knowledge

A Framework for Hierarchical Clustering Based Indexing in Search Engines

Author(s): Parul Gupta | A.K. Sharma

Journal: BVICAM's International Journal of Information Technology
ISSN 0973-5658

Volume: 3;
Issue: 2;
Date: 2011;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: Inverted files | Index compression | Document Identifiers Assignment | Hierarchical Clustering

Granting efficient and fast accesses to the index is a key issuefor performances of Web Search Engines. In order to enhancememory utilization and favor fast query resolution, WSEs useInverted File (IF) indexes that consist of an array of theposting lists where each posting list is associated with a termand contains the term as well as the identifiers of the documentscontaining the term. Since the document identifiers are stored insorted order, they can be stored as the difference between thesuccessive documents so as to reduce the size of the index. Thispaper describes a clustering algorithm that aims atpartitioning the set of documents into ordered clusters so thatthe documents within the same cluster are similar and are beingassigned the closer document identifiers. Thus the averagevalue of the differences between the successive documents willbe minimized and hence storage space would be saved. Thepaper further presents the extension of this clustering algorithmto be applied for the hierarchical clustering in which similarclusters are clubbed to form a mega cluster and similar megaclusters are then combined to form super cluster. Thus thepaper describes the different levels of clustering whichoptimizes the search process by directing the searchto a specific path from higher levels of clustering to the lowerlevels i.e. from super clusters to mega clusters, then to clustersand finally to the individual documents so that the user gets thebest possible matching results in minimum possible time.

Tango Rapperswil
Tango Rapperswil

RPA Switzerland

Robotic Process Automation Switzerland