Epub to tei lite converter

1 comment
This is just to let you know that we now have an epub to tei converter. It can be found here:
http://artfl.googlecode.com/files/epub_parser.tar
As you'll notice, there are three files in this archive. The first one is epub_parser.sh. It's the only one you need to edit. Specify the paths (where the epub files are and where you want your tei files to be in) without slashes and just execute epub_parser.sh. The second one is parser.pl which is called by epub_parser.sh. The third one is entities.pl which handles html entities and is also called by epub_parser.sh. Before running it, make sure all three scripts are in the same directory.
A sample philologic load can be found here:
http://artflx.uchicago.edu/philologic/epubtest.whizbang.form.html
Of course, this is just a proof of concept and will only be used only for text search and machine learning purposes. Some things will have to be tuned up. Note that I put a div1 every ten pages since there is no way to recognize chapters in the original epub files.
Next PostNewer Post Previous PostOlder Post Home

1 comment:

  1. Im impressed. I dont believe Ive met anyone who knows as a lot about
    this subject as you do. Youre truly nicely informed and really
    intelligent. You wrote something that folks could recognize and
    created the subject intriguing for everybody. Truly, excellent blog
    youve got here.
    Buy Led in lahore
    Buy electronics in Lahore
    Hd Wallpapers

    ReplyDelete