Reflections on the 2016 IIPC General Assembly and Web Archiving Conference

May 12, 2016
logo of the International Internet Preservation Consortium

In keeping with shallow tradition, it's taken me a few weeks to collect my thoughts on the recently-concluded IIPC General Assembly and Web Archiving Conference, hosted this year by the National and University Library of Iceland. In the wake of last year's meeting, I speculated on what developments in web archiving we might together effect in the year ahead (now behind). Nearly a year later, that conceit provides a convenient jumping-off point for reflecting on how it all went, where we might go from here, and the tremendous amount of work to do in our one remaining collective month before the anniversary of that post. :)

Before I start, it's worth noting that summary reports have already been posted by Emmanuelle Bermès (in French), Patrick Galligan, Michael Nelson, and Kristinn Sigurðsson; tweeted URLs and tweeters have been visualized by Ed Summers; presentations are gradually being posted to the online schedule; and the tweet streams for both #iipcga16 and #iipcwac16 are still relatively fresh and continue to see a trickle of updates.

So, what of last year's intentions?

Continue the "mainstreaming" of web archives as primary research materials

There was notable progress on this front, some of which was reported out at the conference. The Archives Unleashed web archive hackathon brought together more than thirty researchers with a range of backgrounds and experience using web archives to team up for two days on small, demonstrable projects. On the more traditional academic side was a conference dedicated to web archives as scholarly sources. Meanwhile, analysis platforms for web archives like Warcbase and ArchiveSpark matured, adding to and enhancing the web archive research toolbox.

Explore (and implement?) at least one core API

APIs have become an area of practical interest, both in the context of individual institutions' re-architecture plans and for community consideration. SUL partnered with Internet Archive, Rutgers University, and the University of North Texas on a grant from the Institute of Museum and Library Services to engage the community for development of web archive data transfer APIs. The OpenWayback development team is meanwhile facilitating a conversation about the API that advertises the contents of a Wayback-based web archive (i.e., CDX Server API). The UK Web Archive is looking to re-architect their web archiving system as a set of services interoperating according to APIs, and the LOCKSS re-architecture effort already underway seeks a similar outcome.

Standardize measurement of our web archives

Not much has happened on this front. It may be that APIs ultimately drive standardization that serves measurement, but measurement itself has not seemed to be a major priority.

Broaden the contributors to the OpenWayback project

The number of contributors to OpenWayback is actually down 37% from the previous year - not an optimistic indicator. However, the team delivered two incremental production releases and started serious planning for the next major version. The drop in contributors may be partly attributable to a diffusion of attention among other Wayback platforms, such as PyWb. The Portuguese Web Archive team compared the three existing Wayback variants; it will be interesting to see if that results in cross-pollination of features or affects the distribution of community adoption.

Generalize work on full-text search

The specific idea of a coded reference dataset for tuning full-text search relevance ranking continues to be discussed, though I'm unaware of any actual work. On another front, the UK Web Archive's excellent web archive discovery platform Shine was adopted for the Canadian Political Parties and Political Interest Groups Portal. There's been recent conversation about porting Shine over to Blacklight (Backlight, anyone?), which would align well with our own architecture and roadmap.

An incrementally (or radically?) more mission-supporting Consortium Agreement

It's early yet to have seen major results from the revised Consortium Agreement (PDF), though a number of key changes were implemented: a five-year term; unbundled and more-easily-amendable bylaws; opening of the possibility for interest and task groups; and establishment of "portfolio lead" roles for Tools Development, Membership Engagement, and Partnerships and Outreach. Tom Cramer has wasted no time in his role as Portfolio Lead for Tools Development in rallying folks to consider opportunities in this area.

Beyond those areas I explicitly called out a year ago, I think this year's conference otherwise reflected that it's been a good year for web archiving.

What stood out for me from this year's meeting:

  • Less worry about our challenges, more excitement about our opportunities.
  • Concrete work on APIs already underway in multiple quarters.
  • Signs of a critical mass for IIPC-sponsored, developer-centric events.
  • More, and more diverse, experimentation with models for researcher engagement.
  • Convergence of approaches (e.g., Social Feed Manager using WARC, multiple institutions integrating archiving proxies).
  • An ecosystem featuring promising new service models (e.g.,, Webrecorder)
  • Participation by Google, in the form of a keynote as well as joining IIPC.
  • A program (only one day of administrative business) and branding ("Web Archiving Conference" vice "Open Days") gearing the event toward a broader audience.

A last takeaway, and a comment on so much great activity going on throughout the community: if in every area we could all do as well as the best of what any of us is doing, we'd be doing exceptionally well. I believe that IIPC's potential is to collaboratively foster and propagate these innovations. I look forward again to what we can get done in the coming year. Let's get to work!