background image
Glasgow, UK; DigiCULT Delphi, June 21, 2001) This makes clear that the preservation of
digital cultural resources as persistent and fixed "objects" is already conceptually
Different strategies to archiving the web
The idea of capturing and preserving dynamic web content which, in the future, will
constitute a growing and ever more important part of our cultural heritage, is still in its
infancy, and there are no adequate solutions yet.The experts participating in the ERT
Technology in June 2001, gave examples of three different strategies how to capture and
preserve dynamic digital objects today:
selective capturing of web contents,
comprehensive harvesting of web contents, and
negotiating individual agreements with selected content providers.
The Australian National Library, takes a selective approach to preserving born-digital
material. Since 1996, the PANDORA Archive captures and preserves a limited number of
Australian web sites, including individual documents, collections, and e-journals.While
some web sites, like the one of the Olympic Games 2000 in Sydney have been captured
every day, other sites are harvested much less often. As such a selective approach only
captures a very small portion of what is actually published on the web, there is great need to
work with others and facilitate collaboration in terms of developing preservation standards
and practices cataloguing web resources in a central database to avoid duplication of work
and facilitate online search and retrieval, and finally, establish a notification system to register
online resources that may qualify to be archived in PANDORA.
PANDORA is a combined archiving and preservation project, aiming at the one hand,
to secure long-term public access to web-based materials, and, on the other hand, to
preserve these materials for the future.Therefore, it is actively involved in conducting trials
and tests with various long-term preservation strategies, including migration of data to new
formats and operating systems as well as emulation of obsolete technologies.
The Royal Library, Sweden
The Royal Library, National Library of Sweden takes a very pragmatic,"snap shot"
approach to preserving dynamic digital objects.Twice a year, the library automatically
harvests all Swedish content that has been published on the web by running a robot on the
net. Although some of the information is lost, the losses are deemed acceptable and the
process is believed to deliver a fair sample of the Swedish Web.
Information that cannot be captured that way includes database-managed sites where
users first need to login first. In this case, the proposed draft to a new deposit law for
electronic material suggests to authorise the National Library to demand from selected
publishers access to those databases that cannot be harvested automatically. Due to the fact
that 97% of the harvested material is HTML pages or jpg and gif-files, there is a good
chance that these resources will remain accessible for a longer time period.The remaining
3% of the material are problematic with regards to preservation and will demand special
attention with emulation.