Script that converts Kraken json segmentation output into a PageXML.
Put your scans in a folder and run:
for i n *.png
do kraken -i $i ${i/png/bin.png} binarize
done
for i in *.bin.png
do kraken -i $i ${i/bin.png/json} segment -bl
done
Now run the script:
python kraken_to_pagexml.py *.json
The PageXML files can further be processed with, for example, LAREX or nashi