DWIR Deep Web Indexing and Retrieval technologies

Funded from 2006 to 2007 by the Spanish Ministry of Industry, Tourism and Commerce (FIT-350400-2006-5).

The deep web is the name for the set of pages on the WWW that are not indexed by mainstream search engines, such as Google, Yahoo! or MSN, in contrast to the surface web, which is indexed and searchable by such engines. The deep web is made fundamentally of pages that are not linked by any other page on the surface web, such as the ones that are dynamically generated on demand, or those which require some form of registration to be accessed. A study by BrightPlanet estimates that the deep web is in the order of 500 times bigger than the surface web, and is currently growing quite faster than the latter. The difficulty to find contents in the deep web currently represents a significant obstacle in the development of the knowledge and information society. For instance, it is often a barrier for e-commerce, since many commercial portals are built on top of databases and dynamic web pages, which are not accessible through the most popular search engines, thus hindering the distribution and commercialization of products.

The main goal of this project is the development of a set of free-software solutions and technologies to help make the deep web accessible to main search engines. The project aims to provide tools to make the standard crawlers index deep web content, in a way that end-users can find such contents seamlessly through the usual search engines. Overcoming the current barriers to the deep web is not a simple problem. An advancement in this direction will clearly have a great impact in the development and spreading of the knowledge society, by making available a huge amount of valuable, high-quality information which is currently not found through common web search technologies. In addition, steps towards this goal will enable a considerable potential benefit in the development of e-commerce, by facilitating the dissemination and promotion of products by vendors, and helping consumers to find the optimal products for their needs.