Implementation Of a Mini Search Engine | Computer Science Project Ideas

Published on Feb 13, 2025

Abstract

In this project, we will design and implement a mini search engine that is used to search through a colle ction of documents .

The data struc tures used are files for sto rin g, has h tab les for ind exi ng and tre es for search ing the documents .

The documents will be stored using files and given a set of texts and a query, the search engine will locate all the documents that contain the keywords in that query.

The purpose of this project is to provide an overview of how a search engine works and to gain hands-on experience in using hash tables, files and trees.

Indexing

The documents stored as files will be indexed based on their words/tokens using hashing functions. This is done in order to make it easier to retrieve the required documents.

Searching

Searching will be done using trees, and depend in g upon th eefficiency an d complexity of the algorithm we will use AVL trees or balanced binary search trees.

In order to allow efficient searching, for every word a list of documents where it will occur will be stored.

The queries may contain simple Boolean operators, that is AND/OR, which act in a similar manner with the well-known analogous logical operators.

For each such query, the document that satisfies that query will be displayed.

For instance, a query:

Keyword1 AND Keyword2 -- should retrieve all documents that contain both these keywords (elements).

Keyword1 OR Keyword2 -- instead will retrieve documents that contain either one of the two keywords

Abstract

Indexing

Searching

Related Science Topics