Federated Search and PhiloLogic -- from works to (someday) words

Over the past several years, the ARTFL Project has been developing the code infrastructure for the Intertextual Hub reading environment that federates heterogeneous text collections, extracting data from individual PhiloLogic4 instances and exposing that data to text analysis algorithms in order to allow users to navigate between individual and larger groups of texts related through shared themes, ideas, and passages.

We have now adapted components of this infrastructure to enable federated bibliographic searching on all of the text collections running under PhiloLogic. With the PhiloLogic Federated Bibliography Search database, we offer a simple, yet flexible search system that allows users to search for texts across approximately 90 individual collections in nearly a dozen languages. We currently allow search by author, title, and collection language. Searches can be further delimited by access type and by date range. So for example, a search for titles containing the word “slavery” written in English between 1750 and 1800 yields 38 results from the American Archives Collection, ECCO-TCP, and the Evans Early American Imprint Collection:

https://artflsrv03.uchicago.edu/cgi-bin/federated_bibliography/federated_bib_search.py?author=&title=slavery&language=english&start_date=1750&end_date=1800&sort_by=

Search results contain links to work titles and collections. In results, we note the access status of the collection, whether open or limited to subscribing institutions or to users at the University of Chicago. This same search can be expanded across French and English collections by using a Boolean “OR” and entering “slavery OR esclavage” in the title field:

https://artflsrv03.uchicago.edu/cgi-bin/federated_bibliography/federated_bib_search.py?author=&title=slavery+OR+esclavage&language=&start_date=1750&end_date=1800&sort_by=

This search yields several titles in the open-access Newberry French Revolution Collection, one in the Frantext collection, and one -- a play entitled “L’Esclavage des Noirs, ou L’Heureux Naufrage, Drame” -- in the Théâtre Classique collection.

We envision this bibliographic search system to be the first of many such tools that permit search across the entirety of our collections. In the Intertextual Hub, users can conduct word or topic vector searches across all seven of the 18th-century French collections included in it. Results are returned ranked by relevance. For example, see these results for a search using a topic vector that contains astronomical terms:

https://intertextual-hub.uchicago.edu/search?limit=100&stemmed=yes&words=soleil%20lune%20rayon%20etoile%20chaleur%20nuit%20montagne%20ciel%20astre%20lumiere&binding=OR

Taking inspiration from this federated search approach, we would create a mechanism that enables combined metadata and fulltext queries across all PhiloLogic instances -- or at least a logically coherent subset thereof -- at once, in real time. Users would no longer be constrained to working inside single collections, but could conduct searches across multiple collections and potentially in multiple languages. For example, instead of searching for “slavery OR esclavage” only in titles, users could search for those terms in any number of collections running under PhiloLogic.

The technical details of such a search scheme remain to be hashed out, of course. But the great thing about PhiloLogic4 is that its fundamental architecture makes it possible to create standalone widgets or external apps that query database instances via an API and then repackage and render search results independently. For example, ARTFL’s PhiloReader apps for both Android and iOS work in exactly this way, and from the beginning were meant to be a demonstration of PhiloLogic’s server capabilities (download the Encyclopédie reader apps here and here).

These screenshots illustrate a simple example of the Encyclopédie app interacting with the PhiloLogic4 API. In the left screenshot, the app gets metadata search suggestions dynamically, in this case "Astronomie | Géographie". Query results for articles with that classification appear in the right screenshot.

For a federated search system, a client would send queries to however many PhiloLogic instances; gather and sort query results or links to query results; then present those results to the user. Again, we would first have to work out certain details before creating a search system like this, such as determining the exact nature of query results; whether and how to perform relevance ranking on results; whether we would need to integrate certain kinds of reporting features into PhiloLogic as a parallel development activity, etc.

However we proceed, the experience of building the Intertextual Hub has taught us that we can tap into the indexing, processing, and reporting capabilities of PhiloLogic to draw together many individual, heterogeneous text collections and create larger-scale research environments that allow users to engage in text analysis of an incredibly broad scope.

ARTFL Project Research Blog

Federated Search and PhiloLogic -- from works to (someday) words

0 comments:

Post a Comment

Labels

Popular Posts

Blog Archive

Developed by ARTFL