Database systems have been driving dynamic web sites since the early 90s; nowadays, even seemingly static web sites employ a database back-end for personalization and advertising purposes. In order to keep up with the high demand fuelled by the rapid growth of the Internet, a number of caching and materialization techniques have been proposed for web databases over the years.
The main goal of these techniques is to improve performance, scalability, and manageability of database-driven dynamic web sites, in a way that the quality of data is not compromised. Although caching and materialization are well understood concepts in the traditional database and networking/operating systems literature, the Web and web databases bring forth unique characteristics that warrant new techniques and approaches.
In this survey, the authors adopt a data management point of view to describe the system architectures of web databases, and analyze the research issues related to caching and materialization in such architectures. They also present the state of the art in caching and materialization for web databases and organize current approaches according to the fundamental questions, namely how to store, how to use, and how to maintain cached/materialized web data. Finally, they associate work in caching and materialization for web databases to similar techniques in other related areas, such as data warehousing, distributed systems, and distributed databases.