Presenting ARTFL's high-resolution images with the International Image Interoperability Framework

Those familiar with the ARTFL Project and our work know that we specialize in handling digitized text. Our primary focus is to develop digitized text corpora (mostly in French) and software platforms that scholars and students can use to conduct research on those corpora. Images and image resources have been and always will be a secondary consideration for us. Nevertheless, we have many high-quality, high-resolution images that are remarkable objects of study in their own right and offer significant supplements to our text databases. These include the plate images from volumes 18 through 28 of the Encyclopédie; the Table analytique et raisonnée, also known as the “Arbre généalogique,” an etching that illustrates a taxonomy of the principal arts and sciences of the 18th century; and page images of the Bordeaux Exemplaire of Michel de Montaigne’s Essais.

Over the past year and a half, we have begun to take advantage of software packages and application programming interfaces developed as part of the International Image Interoperability Framework (IIIF) that have allowed us to present our images in their full zoomable glory. Supported by a consortium of universities, libraries, museums and other institutions since 2015, the IIIF is a set of “open standards for delivering high-quality, attributed digital objects online at scale.”

The fundamental unit for IIIF presentation is a JSON (JavaScript Object Notation) file called a manifest, which contains metadata about the digital object and instructions to a server about how to deliver the object (format, size, image portion, rotation angle, etc). For our collections, we have created manifests for each individual image as well as manifests that draw together related images, such as plate groups in the Encyclopédie or chapters and entire books of the Montaigne Essais. Our manifests are publicly available, easily accessible, configured to be usable by anyone, and intended to serve as stable records for these images. The images they give access to are stored on the University of Chicago Library’s archive server for purposes of long-term accessibility.

The other primary component of the IIIF are viewing platforms, the interfaces required for working with manifests. We display our manifests in a platform called Mirador, and indeed, we have developed our manifests to take advantage of Mirador’s functionality. Because our manifests are IIIF-compliant, users can – in theory – study and compare any of our images in any IIIF viewer, as long as they have the manifest URLs.

To help users find and begin working with our images, we have created search interfaces for the Encyclopédie plates and the Montaigne page images. On those pages, users can search for text associated with the images or click the provided links to browse plate groups, essays, chapters, and books. The Arbre généalogique is a stand-alone resource.

For example, searching for the term “sillon” in the Encyclopédie interface will return links to 20 plates where that term can be found in image figure descriptions. These plates come from the domains of agriculture, anatomy, alphabets, botany, etc. Users can click links in the search results to see the individual plate image (Planche 1ere in “Agriculture et Economie Rustique | LABOURAGE”) or the entire plate group (“Agriculture et Economie Rustique | LABOURAGE”) in the Mirador viewer.

In this screenshot, note the figure description and the zoomed-in portion of the image, figure 5. Note also that we include links to the plate in the PhiloLogic instance of the Encyclopédie and to the manifest URL.

Screenshot of Planche Iere in "AGRICULTURE ET ECONOMIE RUSTIQUE | LABOURAGE." with figure description in Mirador viewer.

Likewise in the Montaigne interface, a search for “Virgile” generates 4 instances of that author’s name (spelled in that manner) with links to the page images where the word can be found.

We have taken slightly different approaches toward structuring the IIIF manifests for each of these collections, resulting in slight differences in functionality and appearance.

For the Encyclopédie plates, we have included the figure description for each image in the manifest as a basic metadata value. We did so partly in order to replicate the TEI-XML that serves as the data for our official digital edition of the Encyclopédie running under PhiloLogic. The TEI-XML itself is a composite of separate printed editions that contain either the figure description or the plate images. The manifests, like the TEI, are unique digital objects that unite text and image. In practical terms, this means that the figure descriptions will always appear with the rest of the image metadata in the viewer sidebar by default, as shown in the screenshot above.

The Montaigne page images have two JSON files associated with them. First, a main manifest with bibliographic metadata; and second, an annotation manifest that contains transcriptions of Montaigne’s many hand edits. The main manifest calls the annotation files when loaded into the viewer, which then makes the transcriptions available for perusal in the sidebar. We have configured our Mirador viewer such that the annotations display automatically for each page. Storing the transcriptions as annotations makes reading them much easier, but there’s a drawback to constructing manifests in this way: currently, other viewing platforms, such as Universal Viewer, seem unable to display annotations out of the box. So researchers are required to work with these manifests through a Mirador instance if they want to see the transcriptions.

Screenshot of Montaigne page image with transcription in Mirador viewer.

We have extended this two-file approach with the Arbre généalogique, creating annotation items for each of the leaves of the tree. The annotations include the name of the realm of knowledge on a given leaf and image coordinates for the leaf. Each item also has a “tagging” motivation so that users can click on or mouse-over the name in the Mirador sidebar and the leaf gets highlighted. This simple visual aid is quite handy when working with this dense, complex image. Moreover, we have enabled search functionality for the leaf names using the IIIF Content Search API so that users can find realms of knowledge more easily in the image. Mirador highlights the leaves in the image for all search results. Again, a few caveats apply. We are able to take this approach only, it seems, because of Mirador’s built-in capabilities. Other viewers we’ve tested cannot display or search the annotations, as far as we can tell. The current supported version of Mirador (Mirador 3) is constrained in certain ways, too: search results display only if packaged following the specifications for Search API 1.0. The latest version of the API, Search API 2.0, does not work at this time.

This screenshot shows search results for "histoire." The selected search result is highlighted in yellow; all other results are highlighted in blue.

Screenshot of Arbre with search for histoire and results highlighted in Mirador viewer.

In a perfect world, we would apply the method of annotation we used for the Arbre généalogique to pages from the Essais. Each transcription of Montaigne’s edits would be an annotation item with a tagging motivation so that users could simply click the transcription in the sidebar and highlight the edit in the image. Content search would be easy to implement for such annotations, as well. Unfortunately, there is no practical (automatable) way to get image coordinates for all of the thousands of Montaigne’s edits in all of the pages of the Essais. That work would need to be done by hand.

A simpler task would be to make complete books or texts searchable in a Mirador instance with search result highlighting. One can, in fact, find real-life examples of such resources (see numerous examples in https://mirador-dev.netlify.app/__tests__/integration/mirador/contentsearch.html). Presumably, the developers of those resources were able to get image coordinates for individual words by leveraging bounding boxes from OCR output or hOCR files of the high-resolution text images. Perhaps we will attempt such a feat down the road if we can obtain good quality page images of the right text.

Without question, IIIF has transformed the ARTFL Project’s ability to display and make available high-resolution images. Being able to serve large images dynamically by means of a manifest is actually quite convenient for developers. We hope users find that this approach meets their needs for research and display. Bringing these resources to a state of completion, however, can be incredibly involved. Getting manifests into the correct structure, coordinating all of the components, and configuring the viewer is exacting work. As technologies around IIIF continue to mature, we hope that the aspects of IIIF that don’t work so well currently – enabling user-generated annotations, installing and configuring viewers, etc – will become easier. And we hope that the IIIF’s promised interoperability will in fact become standard.

ARTFL Project Research Blog

Presenting ARTFL's high-resolution images with the International Image Interoperability Framework

1 comment:

Labels

Popular Posts

Blog Archive

Developed by ARTFL