Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] scripts; input <--> output; README.md improved #1

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 25 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,34 @@
# nom-or-what
Nom-or-what algorithm, designed to disambiguate case endings on nouns, adjectives, numerals etc. in Hungarian.
`Nom-or-what` algorithm, designed to disambiguate suffixless nominals in Hungarian.

Files in the bin folder:
## testing

nom-or-what.py: the nom-or-what module.
Run
```python
python3 main.py
```
and see output of the algo in `output_1000.txt`.

main.ipynb: a notebook to test nom-or-what.
`nomorwhat.py`: the nom-or-what module.

The input file has to contain one sentence / line. The tokens need to be annotated with emMorph (in a "/"-separated format, and with the tag set of emMorph).
input_1000.txt is an example file; it contains 1000 sentences nom-or-what has been evaluated on.
`main.py`: for testing nom-or-what.

The output file will be like output_1000.txt: Enumerated sentences with the parsing window of each suffixless nominals in them, and the proposed tag of each suffixless nominal (listed three times - to prepare the file for the manual annotation).
The input file has to contain one sentence / line.
The tokens need to be annotated with emMorph (in a "/"-separated format, and with the tag set of emMorph).
`input_1000.txt` is an example file; it contains 1000 sentences nom-or-what has been evaluated on.

macros.yml: config file for the macros used in nom-or-what.
The output file will be like `output_1000.txt`: each suffixless nominals are listed with a two-token parsing window and proposed `Nom-or-what` tag (which appears three times for manual annotation purposes).

evaluate.ipynb: a notebook for the evaluation of files in the format of output_1000.txt and annotated_1000.txt.
`macros.yml`: config file for the macros used in nom-or-what.

## evaluation

Run
```python
python3 evaluate.py
```
and the output is what we see
in tables 2.3, 2.5 and 2.8 in the thesis.

`evaluate.py`: for evaluating nom-or-what using `annotated_1000.txt`.

Binary file added __pycache__/nomorwhat.cpython-37.pyc
Binary file not shown.
File renamed without changes.
Binary file removed bin/__pycache__/nomorwhat.cpython-37.pyc
Binary file not shown.
Binary file removed bin/__pycache__/nomorwhat.cpython-38.pyc
Binary file not shown.
Loading