The Web has become a singularly important communication medium, for scholarship, social history, and cultural heritage. Unique and valuable information is threatened, however, as content on the Web is continually being updated, replaced, or lost.
Examples of such at-risk materials include: political candidates' campaign websites, that live only for the duration of the election season; the websites of grant-funded projects, whose value to scholarship persists well beyond the period of funding; dissident political speech subject to government censorship; online news related to fast-breaking events, which is quickly amended and submerged; the orphaned web presence of important, deceased personages; and more. Ensuring the continued ability to access web content that has disappeared or been overwritten is imperative for such diverse aims as research, teaching, library collection building, institutional legacy, legal compliance, and government information stewardship.
Recognizing these needs, we are building a web archiving service to support collecting, preserving, and providing access to at-risk web content. This effort leverages our experience from ongoing web archiving projects and is informed by an internal 2011 survey of use cases, feedback from Stanford University stakeholders, and the evolving best practices of the national and international web archiving communities, and it is based on standard data formats and open source technologies.
Subject specialists, faculty, researchers, and other Stanford University staff will identify valuable web content. We will store the web archives in the Stanford Digital Repository, provide discovery through SearchWorks, and enable browsing through a local instance of the PyWB web archive replay platform. We will explore additional access services and strategies for collected content, with a focus on research use.
Please contact us with feedback, questions, or recommendations concerning service development, pilot crawling activity, or consultative support for web archiving projects.