Textual Re-use of Ancient Greek Texts

Textual Re-use of Ancient Greek Texts: A case study on Plato’s works

Marco Büchler & Annette Loos (eAqua Project, Leipzig)

Digital Classicist/ICS Work in Progress Seminar, Summer 2009 Link

See abstract of workshop presentation. Appears to use ngrams with with a mechanism to "relax word order" and a kind of semantic association. Russ and I have talked a bit about both as future extensions to PhiloLine/PAIR to improve recall, but at the risk of introducing less precision.

PhiloLogic: Ubuntu 64 bit compilation failure

Damir Cavar reports:

After evaluations with various Linux distributions we came to the conclusion: Philologic index generation (the C-code) breaks on 64-bit (various versions) with a segmentation fault. We didn't manage to let it run in a 32-bit changeroot environment on Ubuntu and Debian.

It works perfectly well on the newest release of the 32-bit Ubuntu server, and also on 32-bit Debian Lenny. On a 32-bit system the default is most likely that one has a memory limitation, i.e. max. 3.5 GB RAM, even though there might be more RAM available physically. If you install the Ubuntu "server kernel" on a 32-bit system, you get large memory support (i.e. more than 3.5 or 4 GB RAM), i.e. you need a PAE enabled kernel. On Debian it is the bigmem kernel you need to install. A 32-bit system is somewhat slower, there are various other disadvantages (if one uses other code or software that makes use of advanced 64-bit CPU features), but, well, we seem to have no other choice now for a solution with Philologic right now.

We have a version running, now on Debian Lenny with the bigmem kernel, and we're putting the bits and pieces together, i.e. our Croatian localization, some scripts for statistics etc. Once this is up, I'll place some more docu, scripts, localizations and adaptations at the Croatian Language Corpus site: http://riznica.ihjj.hr/ (this is still the old system, we are just migrating the infrastructure to new servers, using Lenny)

More can soon be found on the pages of the Linguistics dept. at the University of Zadar: http://ling.unizd.hr/

Should somebody have a fix for a 64-bit Linux environment, hints would be very much appreciated.
ASV Toolbox project

ASV Toolbox is a modular collection of tools for the exploration of written language data. They work either on word lists or text and solve several linguistic classification and clustering tasks. The topics covered contain language detection, POS-tagging, base form reduction, named entity recognition, and terminology extraction. On a more abstract level, the algorithms deal with various kinds of word similarity, using pattern based and statistical approaches. The collection can be used to work on large real world data sets as well as for studying the underlying algorithms. The ASV Toolbox can work on plain text files and connect to a MySQL database. While it is especially designed to work with corpora of the Leipzig Corpora Collection, it can easily be adapted to other sources.

Many of these appear to be described in recent papers by Beimann and his collaborators.

Thanks to Alain Guerreau for the pointer.
