The interconnected nature of web pages means that, when capturing the Web in finite-sized chunks using a tool such as Conifer, the picture will (almost) always be incomplete. One can choose to follow every link on a page, but at a certain point choices must be made about what to leave out from a capture. This is particularly true for contexts such as social media sites, where the default layout for a page or post might include dozens of links to extraneous content. The person capturing must decide which resources are relevant to their purposes, and which aren’t.

At the same time, resources can be left out of a capture for other reasons. Sometimes images and other media, or even entire pages, are already removed or unavailable on the live web at the time of capture.

By default, web archive replay systems, including Conifer, provide a “resource not found” page when someone browsing a collection clicks on a link to a resource that wasn’t captured. However, this can be confusing for the viewer, since there is no indication that a link points to a missing resource until it’s clicked.

The Conifer error message for web resources that are not contained in a collection.

Additionally, Conifer and other replay systems don’t provide a means for contextualizing missing resources: explaining why they were left out of a capture, clarifying that a resource was missing/unavailable at capture time, etc.

There are also cases where curators may want to highlight particular resources that are included in a collection, such as in collections where the vast majority of links point to out-of-boundary resources. In these instances, curators should be able to guide viewers on specific “paths” through a collection by pointing specific links and elements that lead to in-boundary resources.

Introducing Periphery

During my internship at Rhizome, I took on the challenge of working on a system to help address these problems, creating Periphery: a tool for collection owners to define how missing resources are expressed during the replay of archived content. Periphery will enable the systematic definition of specific interactions relating to uncaptured resources–down to the level of specific elements and links on pages. Through an interface embedded in Conifer, it will allow these interactions to be defined and saved so that they’ll be applied for anyone viewing the collection publicly. Some of the possible interactions include: disabling links to uncaptured resources, putting visual overlays on links, providing descriptions / contextual information about missing resources, displaying popup information on missing resources.

How it works

Existing web archive replay systems, including Conifer, provide a means for injecting arbitrary code into replayed pages. This functionality is currently used by Wombat to rewrite page links to point to archived pages instead of the live web. Periphery modifies pages in a similar way, allowing for arbitrary styling and overlaying of elements on a page. These augmentations to a page can be defined so they are visible to users accessing the collection, but the changes only appear at the time of viewing; the underlying archived resources and web pages remain unchanged.

For instance, the following boundary expression (in YAML form):

boundaries:
 - selector:
     type: link-query
   type: on-load
   action: 
     type: disable
   overlays:
     - display: visible
       type: box
   resource: all
   description: Pages outside of the main site were not captured.

creates transparent overlays on top of missing links on a capture of https://rhizome.org/:

Screencapture of Periphery in action

Here is another example using Brian Droitcour, Yelp! reviews, 2012-2014, in which link inside the boundary—reviews and photos posted by the artist—, and outside the boundary—all the links that go to different locations on the site—are marked with differnt overlays:

Screencapture of Periphery in action

Future

These boundary expressions could become part of web archive collections in the future. They’re especially useful if a collection’s goal is centered around a specific artwork or publication.

Rhizome will start implementing this framework on the webenact web archive server and is looking how to provide it for users of Conifer, too.

Currently, the project lives on GitHub!

Thanks

Many thanks to all the folks that contributed ideas and feedback throughout the course of this project: Lozana Rossenova, Lyndsey Moulds, Anisa Hawes, Sumitra Duncan, Nika Maltar, Pat Shiu, Mark Beasley, and Dragan Espechschied!



Matthew Brucker is a technologist, designer, and education researcher who attended Olin College of Engineering. In spring and summer 2020 he was a remote intern at Rhizome’s Preservation Program. Matthew’s internship was supported via the Sketch Model program.