vietnamese-statistics

Tools to do some statistics on a Vietnamese corpus. The corpus must be segmented, can spread on multiple files.

words.py: print all words in a corpus
prefixes.py: print all prefixes in a corpus
suffixes.py: print all suffixes in a corpus
bind-prefixes.py: read prefixes from a file (format: ) and print them together with all words in a corpus
bind-suffixes.py: read suffixes from a file (format: ) and print them together with all words in a corpus
count-all: generate count and bind files for a corpus, also serves as an usage example

There are ready-made statistics of RIDF-2013 corpus in files *.cnt, *.bind.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
bind-prefixes.py		bind-prefixes.py
bind-suffixes.py		bind-suffixes.py
count-all.sh		count-all.sh
format-wikitable.py		format-wikitable.py
prefixes.bind		prefixes.bind
prefixes.cnt		prefixes.cnt
prefixes.py		prefixes.py
suffixes.bind		suffixes.bind
suffixes.cnt		suffixes.cnt
suffixes.py		suffixes.py
words.cnt		words.cnt
words.py		words.py

Provide feedback