C program for extracting anonymous edits from the Wikipedia database backup dumps: http://dumps.wikimedia.org/backup-index.html
It extracts those edits and creates PostgreSQL-readable file for a COPY .. WITH BINARY
command, so we can quickly load them into a database and search among them.
Program depends on a libxml
library.
- WIP;
- Just an educational project;
- Still have no ideas how to cook C properly.