-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word attributes missing in .vrt file #1
Comments
Corrected mapping of fields solves most of the issues with word attributes. However, dependencies are not present in the base |
To finish replicating the behaviour of the slides, we need to recover the |
This is excellent, as many of the lost features are now back! If this works for the A-W corpus, I can easily run the same analyses for the Bloomfield and Miscellaneous corpora, so that we can add them to our Cree text collection. The I can recreate the |
currently the order is |
My plan is once the new machine is setup, I'd put |
Although it does not immediately have priority, we could address UAlbertaALTLab/korp-frontend#29 and UAlbertaALTLab/korp-frontend#24 as well. |
How the regular analysis process works is that the various fields get organized as follows, with tabs in between:
We could add RW or WN semantic classes, etc. Though one feature that would be good to sort out is how can spaces be included in the fields without them being mistaken for field delimiters, which is supposed to be make use of only tabulators. Now, I've replaced spaces with |
UAlbertaALTLab/korp-frontend#24 would be the higher priority. |
That order is ok!
I'm looking into that, as the frontend discusses multiple-value fields which would be what we want for fields like RW/WN/etc. |
It should not take me too long to create the config files for a new corpus, but the |
Yeah, the point is that we have a skeleton *.vrt file with only the tokens and the structural metadata, on which all the layers of linguistic analysis can be added, and rerun when the analyzers improve. The scripts we've got should be able to accomplish the analyses quite quickly. The linguistic analysis fields would be exactly the same as for the Ahenakew-Wolfart corpus. What would differ potentially is the meta-structure of the texts, as the available metadata is different. |
There's now a revised version of the analyzed VRT file for the A-W corpus, in: This is done by the following sequence:
Some further conversion of special characters to HTML might be needed in the script: |
A first version of the Bloomfield corpus in VRT form is presented here: UAlbertaALTLab/korp-frontend#24 (comment) |
Word Attributes missing:
As seen in
The text was updated successfully, but these errors were encountered: