In an earlier post, I mentioned that I'd try to to call the philologic C routines via ctypes, a Python Foreign Function Interface library. I did, and it worked awesomely well! Ctypes lets you call C functions from python without writing any glue at all in some cases, giving you access to high-performance C routines in a clean, modern programming language. We'd ultimately want a much more hand-crafted approach, but for prototyping interfaces, this is a very, very useful tool.
First, I had to compile the search engine as a shared library, rather than an executable:
gcc -dynamiclib -std=gnu99 search.o word.o retreive.o level.o gmap.o blockmap.o log.o out.o plugin/libindex.a db/db.o db/bitsvector.o db/unpack.o -lgdbm -o libphilo.dylib
All that refactoring certainly paid off. The search4 executable will now happily link against the shared library with no modification, and so can any other program that wants high-speed text object search:
#!/usr/bin/python
import sys,os
from ctypes import *
# First, we need to get the C standard library loaded in
# so that we can pass python's input on to the search engine.
stdlib=cdll.LoadLibrary("libc.dylib")
stdin = stdlib.fdopen(sys.stdin.fileno(),"r")
# Honestly, that's an architectural error.
# I'd prefer to pass in strings, not a file handle
# Now load in philologic from a shared library
libphilo = cdll.LoadLibrary("./libphilo.dylib")
# Give it a path to the database. The C routines parse the db definitions.
db = libphilo.init_dbh_folder("/var/lib/philologic/databases/mvotest5/")
# now initialize a new search object, with some reasonable defaults.
s = libphilo.new_search(db,"phrase",None,1,100000,0,None)
# Read words from standard input.
libphilo.process_input(s,stdin)
# Then dump the results to standard output.
libphilo.search_pass(s,0)
# Done.
That was pretty easy, right? Notice that there weren't any boilerplate classes. I could hold pointers to arbitrary data in regular variables, and pass them directly into the C subroutines as void pointers. Not safe, but very, very convenient.
Of course, this opens us up for quite a bit more work: the C library really needs a lot more ways to get data in and out than a pair of input/output file descriptors, I would say. In all likelihood, after some more experiments, we'll eventually settle on a set of standard interfaces, and generate lower-level bindings with SWIG, which would alow us to call philo natively from Perl or PHP or Ruby or Java or LISP or Lua or...anything, really.
Ctypes still has some advantages over automatically-generated wrappers, however. In particular, it lets you pass python functions back into C, allowing us to write search operators in python, rather than C--for example, a metadata join, or a custom optimizer for part-of-speech searching. Neat!
0 comments:
Post a Comment