Mapping archives to digitised objects

Tom Crane
7 min readJan 31, 2020

Although our work at The UK National Archives isn’t particularly about digitised content, I’ve still been thinking about how records map to IIIF, the model for presenting complex digital objects on the web.

This is how things are organised at The National Archives:

Archival hierarchy at The National Archives

There are over 400 Departments at the top level, representing for the most part various machineries of the state over the last 1000 or so years. For example, GUK is the Records of the GOV.UK website, which still contributes records to the archives; QAB is Records of the Office of Queen Anne’s Bounty, including some records inherited from the Office of First Fruits and Tenths, which has been fairly quiet lately.

In common with most archives, the Series is the point at which you might enter a particular group of records. Sometimes, one or more of the levels are left out, and sometimes there are additional, unnamed levels in between.

Out at the leaves of this tree are Piece and Item. Not every series is described down to these levels, but when things are, a Piece represents an individual orderable thing — something you could put in a request for, hold and look at — and an Item is an individually-described part of this thing, such as a set of pages that comprise one catalogued document within a larger file, or a medal card, where there are six medal cards stuck on each page of the Piece. You couldn’t ask for just one medal card, because it’s physically part of a Piece: the Piece is usually the orderable unit that can be summoned up from the shelves, and the Item is the individually-described medal card for Tom Jones, Private, of the Dragoons regiment D/26427, within that Piece¹.

This hierarchy of Piece => Item is the opposite way round from most other archives. For example at Wellcome, the Item level is the parent of the Piece level. Perhaps it’s the difference between a physical way of looking at hierarchy — “this letter is a piece of this item” — and a descriptive view of hierarchy — “we have described some items within this piece of the archive”. It allows for a particularly satisfying application of the verb “itemise” as an archival activity — “we have now itemised this piece” — that is, catalogued it deeper to make individual records for its logical parts. And it keeps the terminology abstract enough to cater for many different types of archival thing, without potentially throwing people off with archival terms that sound like things in the real world, like File.

Anyway, this Piece and Item terminology holds a clue about mapping to models for digital delivery. In IIIF, the Manifest is the unit of distribution: it lives on the web at a URL. A viewer or annotation tool or some other software loads the information at this URL and uses it to present a digital object. To me, it is most sensible to align this concept to the object in hand, the thing you could put in a request for, to look at. So at The National Archives, the Piece would be the Manifest.

The Item is a part of that object, which in IIIF is modelled with a Range — an extent within the object. This might be many pages of a document, or it might be just one part of one page, like an individual newspaper story or the medal card mentioned above. It can also be an extent of time: a segment from a film.

If the Items map to IIIF Ranges, we can do interesting things with the IIIF Presentation and Image APIs. Our IIIF digital scissors can extract individual items for different UI purposes, such as presenting page-mounted cards as individuals again (e.g., in a rolodex-like user interface), or focusing on just the one newspaper story with its catalogue record, or just one logical unit of a few pages within a larger file — to focus attention on it. This approach is successful where IIIF is used to model newspaper articles, and its non-destructive multiple rearrangement has many possibilities for engaging with archives.

Looking up instead of down, the levels above a Piece are all IIIF Collections; a IIIF browsing tool would open them like folders until it got to a viewable object in the form of a Manifest.

An idealised view of how the hierarchy maps to IIIF resources, when we have Itemised Pieces

And below the Manifest, further itemisation is represented as ranges, to whatever degree is required. Additional information about extents of the content can also be provided as Annotations, which target the extent of the object for any purpose, not just the navigational/structural information conveyed by Ranges.

But what if catalogue description stops at the Subseries level, and there’s no further detail below? In this case it probably hasn’t been digitised anyway, but if it had been, and there was just a sequence of hundreds or thousands of images with no further means to organise them, then we’d just have to have a Manifest at this level too, or perhaps several arbitrary Manifests to break the content up into more manageable chunks — it’s important to keep manifest sizes sane for the same reason it’s important to keep book sizes sane, it’s a question of usability. The size of our unit of distribution has been determined by the level to which the material has been described, and how easy those units are for software and humans to work with.

This does mean that with further cataloguing (perhaps encouraged by the material’s newly acquired status as an interoperable digital object), what was once a Manifest might become further Collections, Items and Ranges over time. So even in the context of one institution, the mapping of archival level to digital object can’t follow a strict formula.

This difficulty (or impossibility) of a crosswalk means that it’s not so straightforward to publish a recipe for archives to follow to publish material as IIIF. The Newspaper community in IIIF has agreed on a set of common conventions, which providers of digitised newspapers and periodicals collections can implement. But with newspapers we are mapping human interaction concepts like volume, issue, section and article to IIIF, not archival hierarchy to IIIF. For archives, I don’t think such a crosswalk can be formalised.

You can’t consistently map archival concepts of hierarchy across to particular resources: Series = Collection, Item = Manifest and so on. Instead, IIIF gives you a presentation toolbox, which you’re going to need to apply based on Presentation semantics — what kind of user experience might you want.

Filling in a digital mosaic

Visitors expect everything to be digitised when they arrive at The National Archives web site but it isn’t — not by a long way. That doesn’t mean that people aren’t taking photographs of the material, though. Some of these photographs are taken by National Archives staff on demand, in a professional studio, when paying customers order arbitrary sections of records. And some are taken by researchers and other in-person visitors to the archives, on their phones, for later use.

This ad-hoc digitisation is tantalisingly almost reusable by others — but not quite. If I photograph a few pages from one manuscript or document, and the National Archives photograph a few more, it would be great to have the opportunity to slot these images into a shared abstract model of the physical object. IIIF can be this model, because you can have a Manifest representing a Piece, with some or even none of the Canvases within that Manifest yet populated with images (from whatever source — there could be many contributed photographs of the same thing). The initially empty canvases are gradually populated with images until you have images for the whole thing — group contribution populating unoccupied IIIF-space until you have the whole object. The Canvas abstraction — separating the notion of view from content such as photographs or text that populate that view — is what gives IIIF its power.

There are problems with this. In order to lay out the space in the first place (the virtual, unpopulated Manifest), you need to know how many views there are within a Piece, and in what order. The material needs to be foliated — that is, someone needs to number the pages (all the possible views) of the thing and in this case, record them in some shared system. If The National Archives or a private citizen wish to add their contributions to a IIIF representation, they need to know that the photographs they have taken correspond to particular identified views (pages, the fronts and backs of loose paper sheets, or other more diverse contents), so they can slot their photographs into the right places.

And strictly, the empty Canvases still need width and height properties — to establish a coordinate system and aspect ratio to place content in. I think this is surmountable, though, if you accept that a Canvas’s dimensions could be provisional. It could have a default square shape into which any contributed content is best-fitted, and given more appropriate dimensions later.

The answer to “how do I map my archive to IIIF” is “it depends”. But by keeping in mind that IIIF is for presentation to humans (and maybe machines analysing digital objects) rather than for description, you should be able to use the toolbox IIIF provides appropriately. There is also a vast unpopulated IIIF space waiting to be filled — IIIF can model the archive even if the digitisation hasn’t yet happened, or is piecemeal. But staking out the space usually happens along with formal digitisation — working out how that space could be staked out ahead of time, for sporadic population, needs more thought.

Thanks to Andrew Janes and Matt Hillyard for clarifying some Piece and Item subtleties and the reasons behind them.

  1. Sometimes, you can order an Item… but most of the time, it’s the Piece that is fetched for you

--

--