Web archiving has gained the attention of both the academic community and the broader public in the past few years, thanks to a series of highly publicized studies of the prevalence of link rot in web-based scholarly journals, and feature articles on web archiving in publications including the New Yorker and the Atlantic. Web archives are a new and complex documentary form, but a valuable one with an extensive range of research applications. Yet there is little consensus among cultural heritage institutions tasked with archiving web-based resources about best practices for describing and making available the archived websites in our collections. The paper explores the diversity of descriptive practices and discovery platforms currently in use, evaluates their impact on access to collections of web archives, and makes recommendations about which aspects of each model might best serve as standard practices going forward.
Case studies of the descriptive practices and discovery platforms implemented by three institutions with varying perspectives on and needs from their collections of web archives form the core of the project. The New York Art Resources Consortium is a trio of art libraries in New York City which describes its archived websites using MARC records conforming to the RDA content standard, and provides access to them through a customized layer of their shared online public access catalog. The Tamiment Library and Robert F. Wagner Labor Archives at New York University, meanwhile, describes its archived websites using the DACS content standard, and provides access to them in EAD finding aids in the New York University Libraries’ archival collections search portal. Finally, Archive-It—a web archiving software provided by the Internet Archive on a subscription basis—incorporates both full-text search and the Dublin Core metadata schema, and provides access to sites archived by its hundreds of subscribers through a search portal at Archive-It.org. Many, if not most, of Archive-It’s approximately 400 subscribers use these provided tools as their sole method of description and access for their collections. In that sense, then, Archive-It can stand in for any number of cultural heritage institutions which simply affix their logos to Archive-It’s standard features.
Each approach can be adapted to meet the unique needs of web archives and their users, maintaining consistency with institutional practices in the absence of broader professional standards. Libraries and archives should strive to integrate their web archives into shared discovery and access platforms alongside descriptive records for other formats of collection materials as fully as resources permit. Description of web archives should also take into account not only what is present in the archived websites, but also what is absent. Current methods of capturing and reproducing the contents of live websites are far from perfect, and the information that streaming media, dynamic content, and the full contents of databases are not reproduced in a given archived website is of high value to researchers. Finally, information about the processes and curatorial decisions which led to the creation of an institution’s web archives should also be provided in some degree of detail. This information is indispensable to researchers interested in using web archives as datasets, as it allows them to identify and correct for the ways that curatorial choices and technical capabilities can skew the data found within the web archives.
Web archives have so far fallen short of transforming the information landscape in the way that the live web undoubtedly has. Yet they can solve many of the problems associated with the live web, from link rot to the ease with which users can erase and rewrite documentation of the past. Improved description and access models for web archives can help them accomplish these tasks.