While Clovis and I continue to document various aspects of PhiloLogic4's architecture and design, it may be helpful to keep in mind a sort of top-level "bird's-eye view" of the system as a whole. PhiloLogic does a huge number of different things at different times, and it can be very difficult to keep them all organized. My best attempt to convey it in a single diagram is below:
As with PhiloLogic3, the foundation of all PhiloLogic services is a set of C functions, which are now collected together in a library called "libphilo", contained in the main PhiloLogic4 github repository. These provide the high-performance compression, indexing, and search algorithms that distinguish PhiloLogic from most other XML and database technologies.
This C library is the building block upon which all of PhiloLogic4's python library classes are built. The two most important are
- the Loader class, which controls parsing and indexing TEI XML files, and
- the DB class, which governs all access to a PhiloLogic database.
These classes themselves make use of other classes, most of which appear in the diagram above; it's extremely important to note that the Loader and the DB share almost no behaviors or components.
This separation is a point of departure from most other database systems: in PhiloLogic4, the set of components that produce a database is distinct from the set of components that query an existing database. We refer to the time when XML documents are ingested and indexed as load-time, and the time when a user queries the database as run-time or query-time.
Although one of the original design goals of PhiloLogic4 was to focus on the development of a more generalized library for TEI processing, it became clear at some point that a set of general behaviors was not enough, and that pragmatic development required two additional components:
- a general-purpose document-ingesting script, capable of handling errors and ambiguity, and
- a readymade web application suitable for most purposes, and customizable for others
These components were built as applications making use of the standard library components, and allow a PhiloLogic developer to specify all text- and language-specific features without modification of any shared functions.
The load_script has been described already in a previous post, but it is worth revisiting in this broader context. The load script is responsible for three fundamental tasks:
- taking command-line arguments from the user, and passing all the supplied files into the loader class, along with additional parameters
- storing all system-specific configuration parameters: hostname, filesystem locations, etc.
- storing all text-specific configuration parameters: XPaths, tokenization regexes, special filters, etc.
When the load script has finished running, it moves the loaded database into an appropriate path in the web server's document tree, and creates a web application around it. This is the very same web application described in Clovis's recent post. It is created by copying a set of files stored elsewhere, typically in the PhiloLogic4 install directory, although specifying another set of files to "clone" from is possible. It is important to note that, by convention, we refer to the web application together with the database that it accesses as a "database", as one almost never exists without the other, and this is reflected in the diagram above.
The behavior of such a database/application is just as Clovis described it: all queries go to one of several "report generators", which interpret query parameters and access the database accordingly. They produce a result object, a python object that maps very closely to a JSON object--that is, a single dictionary literal consisting of other literals, without functions, tuples, lambdas, objects, and other such structures that cannot be expressed in JSON. This result object is then passed on to a Mako template file, which can transform the result into HTML viewable by a web browser, which is finally returned to the user--"finally" usually meaning under 100 milliseconds, of course.
Over the coming months, Clovis and I will be describing many of these components in detail, and this post may be updated as this larger documentation project proceeds; but for now, I hope it serves as a helpful overview of PhiloLogic4.