-
Notifications
You must be signed in to change notification settings - Fork 4
NoraLucene
Everything is deployed to /logon/johanbev/wescience0. Everyone on the ps.titan should have full access to this, this is also a potential security breach. Please help fix this.
This setup uses Suns java 1.6, which is stored at my root at ps. The needed jars are at /logon/johanbev/jars.
We have built a "Hello-World" app, extracting text from one pdf file and indexing it with lucene. This app resideds at /logon/johanbev/wescience0/Luctest. This hello world is used to help check the environment at ps for our project. We have provided a helpful run.sh which will invoke java with the right parameters to start the app.
This has not been discussed in great detail, but johanbev suggest to decouple the text-extraction/correction/pdf-stuff part of this and the indexing proper. This will make nice interfaces for everyone to program against, and make us ress reliant on day-to-day communication with the lucene-team. For example the extractor could build a dirtree with txt-fields, annotated with the agreed upon fields in the header. Later the lucene-indexer/scraper reads this and builds its index.
Home | Forum | Discussions | Events