Announcing Webrecorder API and WASAPI Support
Over the years, Webrecorder has been developed as a fully API-driven application, with a web archiving backend and React-based frontend. We’ve been working on a spec for the full API and initial documentation. The API documentation is now available at:
https://webrecorder.io/docs/api
The API includes all the functionality that is available on https://webrecorder.io/
WASAPI
A subset of the Webrecorder API is initial support for WASAPI, an API for bulk data transfer from web archives, developed by Archive-It and Stanford Libraries.
Webrecorder implements the core WASAPI specification for WARC downloads, allowing users to download all WARCs from their account or from a single collection.
WASAPI is best used with a client tool and support for the Webrecorder implementation has been added to py-wasapi-client in the latest release.
To use, install run pip install py-wasapi-client
or clone the repo.
Then, to download all Webrecorder WARCs for account <USERNAME>
, run:
wasapi-client -u <USERNAME> -b https://webrecorder.io/api/v1/download/webdata
To download only the WARCs from a specific collection <COLLECTION>
:
wasapi-client -u <USERNAME> --collection <COLLECTION> -b https://webrecorder.io/api/v1/download/webdata
Refer to the latest wasapi-client README for additional options.
When downloading WARCs from https://webrecorder.io/ which are stored on S3, the API will provide links directly to S3. This should allow for much faster downloads.
The data returned from the api will include the most up-to-date data, including WARCs that are currently ‘open’ for writing (as part of an active recording session). These WARCs may end in .warc.gz.open
extension and will eventually be replaced with final WARCs once the recording session is finished.
The WASAPI spec does not specify a way to include metadata or any other data besides WARCs, but additonal Webrecorder-specific options may be added later. All Webrecorder metadata is available as part of the Webrecorder API (see the collections, users, lists and bookmarks sections of the API).
What other features would you like to see in WASAPI implementation or general Webrecorder APIs?
Let us know in the comments, or by reaching out to us directly, via email, Twitter or github!
And a big thanks to Lauren Ko from UNT Libraries for creating py-wasapi-client and helping implement Webrecorder support!