Skip to content

Commit

Permalink
Tag explanations
Browse files Browse the repository at this point in the history
  • Loading branch information
TinoDidriksen committed Jan 19, 2024
1 parent 83ad0d5 commit 43b3d31
Show file tree
Hide file tree
Showing 12 changed files with 827 additions and 18 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
/l10n-eng.php
/l10n-kal.php
/l10n.js
/l10n-tags.js
/_vendor
*.sqlite
*.csv
Expand Down
1 change: 1 addition & 0 deletions _inc/lib.php
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ function page_header($title='SITE_TITLE') {
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/jquery.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"></script>
<script src="_static/l10n.js?<?=filemtime(__DIR__.'/../_static/l10n.js');?>"></script>
<script src="_static/l10n-tags.js?<?=filemtime(__DIR__.'/../_static/l10n-tags.js');?>"></script>
<script src="_static/nutserut.js?<?=filemtime(__DIR__.'/../_static/nutserut.js');?>"></script>
</head>
<body class="d-flex flex-column">
Expand Down
6 changes: 3 additions & 3 deletions _pages/gloss.php
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

<div class="row my-3 justify-content-center">
<div class="col-lg-6 col-md-9 col-sm-12">
<p><span data-l10n="TXT_GLOSS_010">This is a language learning tool designed to show you the source and target language side-by-side. This is <em>not translation</em>. While this tool does try to provide the correct target language translation based on the source language context, it leaves word order and conjugation/inflection as an exercise for the reader.</span> <a href="#info" data-l10n="LBL_READ_MORE">Read more…</a></p>
<p><span data-l10n="LBL_GLOSS_010">This is a language learning tool designed to show you the source and target language side-by-side, in order to understand Greenlandic in its original context. This is <em>not translation</em>. While this tool does try to provide the correct target language translation based on the source language context, it leaves word order and conjugation/inflection as an exercise for the reader.</span> <a href="#info" data-l10n="LBL_READ_MORE">Read more…</a></p>
</div>
</div>

Expand Down Expand Up @@ -59,9 +59,9 @@
<div class="row my-5 justify-content-center">
<div class="col-lg-9 col-md-9 col-sm-12">
<h5 data-l10n="HDR_GLOSS_INFO" id="info">Uses of annotation</h5>
<p data-l10n="TXT_GLOSS_020">Let's start with an example. If we take the Greenlandic prompt "<em>Kalaallisut oqaaseqatigiit nutserneqartussat</em>" and <a href="./n1k">annotate it</a>, we get a breakdown of each word's analysis and then a translation of the roots (lemmas), morphemes, inflexion, and cases therein.</p>
<p data-l10n="TXT_GLOSS_020">For example, if we take the Greenlandic prompt "<em>Kalaallisut oqaaseqatigiit nutserneqartussat</em>" and <a href="./n1k">annotate it</a>, we get a breakdown of each word's analysis and then a translation of the roots (lemmas), morphemes, inflexion, and cases therein.</p>

<p>Looking at the word "<em>nutserneqartussat</em>", we can see that the root "<em>nutser</em>" semantically has to do with turning into something (<code>Sem/turn_into</code>) and is translated as "<em>translate</em>". We don't have a translation for the morpheme <code>NIQAR</code> (it turns the construction passive), but <code>TUQ</code> means "<em>one who</em>", and <code>SSAQ</code> means "<em>future</em>".</p>
<p data-l10n="TXT_GLOSS_030">Looking at the word "<em>nutserneqartussat</em>", we can see that the root "<em>nutser</em>" semantically has to do with turning into something (<code>Sem/turn_into</code>) and is translated as "<em>translate</em>". The morpheme <code>NIQAR</code> turns the construction passive, <code>TUQ</code> means "<em>one who</em>", <code>SSAQ</code> means "<em>future</em>", so the verbatim phrase is "future one who becomes translated", which we can write cleaner as "to be translated".</p>
</div>
</div>

Expand Down
5 changes: 3 additions & 2 deletions _pages/hybrid.php
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,9 @@
<div class="row my-5 justify-content-center">
<div class="col-lg-9 col-md-9 col-sm-12">
<h5 data-l10n="HDR_HYBRID_INFO" id="info">Hybrid Rule-Based and Machine Learning</h5>
<p data-l10n="TXT_HYBRID_010">This is a machine translation model trained from parallel human-authored texts that have been through parts of the rule-based Greenlandic language analysis engine. In this first phase, no effort has gone into cleaning up or verifying that the texts are truly parallel. When compared to the <a href="./machine">raw artificial intelligence</a> engine, it is evident that by providing linguistic expertise we can greatly improve the translation quality, even if the parallel texts are of dubious quality.</p>
<p data-l10n="TXT_HYBRID_020">In 2024, we will work on verifying that the parallel corpora are of good quality, and on improving the Greenlandic analyser to provide better data, including better spelling- and grammar-checking.</p>
<p data-l10n="TXT_HYBRID_010">This is a machine translation model trained from parallel human-authored texts that have been through parts of the rule-based Greenlandic language analysis engine. In this first phase, no effort has gone into cleaning up or verifying that the texts are truly parallel or even correct Greenlandic. When compared to the <a href="./machine">naive artificial intelligence</a> engine, it is evident that by providing linguistic expertise we can greatly improve the translation quality, even if the parallel texts are of dubious quality.</p>
<p data-l10n="TXT_HYBRID_020">In 2024 and onwards, we will work on verifying that the parallel corpora are of good quality, and on improving the Greenlandic analyser to provide better data, including better spelling- and grammar-checking.</p>
<p data-l10n="TXT_HYBRID_030">We guarantee that none of your data is stored, unless you explicitly share a translation. And we will make all the underlying algorithms and technology, and the public parts of our training data, available for everyone to build on.</p>
</div>
</div>

Expand Down
2 changes: 1 addition & 1 deletion _pages/machine.php
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
<div class="row my-5 justify-content-center">
<div class="col-lg-9 col-md-9 col-sm-12">
<h5 data-l10n="HDR_MACHINE_INFO" id="info">Machine Learning</h5>
<p data-l10n="TXT_MACHINE_010">This is a machine translation model trained from raw parallel human-authored texts. This is about as good as you can get with zero linguistic expertise. In this first phase, no effort has gone into cleaning up or verifying that the texts are truly parallel. We made this model public to show what is possible from the bilingual texts one can find online, and as a comparison to the <a href="./hybrid">hybrid artificial intelligence</a> engine where we provide a little linguistic expertise.</p>
<p data-l10n="TXT_MACHINE_010">This is a machine translation model trained from raw parallel human-authored texts. This is about as good as you can get with zero linguistic expertise. In this first phase, no effort has gone into cleaning up or verifying that the texts are truly parallel or even correct Greenlandic. We made this model public to show what is possible from the bilingual texts one can find online, and as a comparison to the <a href="./hybrid">hybrid artificial intelligence</a> engine where we provide a little linguistic expertise.</p>
</div>
</div>

Expand Down
8 changes: 4 additions & 4 deletions _pages/pre.php
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
<h3 data-l10n="HDR_ITIS">Nutserut is …</h3>
<ul>
<li data-l10n="TXT_IS_010">an advanced rule-based machine translation service, developed by Oqaasileriffik</li>
<li data-l10n="TXT_IS_020">a good tool to read news or other texts</li>
<li data-l10n="TXT_IS_030">a good helper if you're learning Greenlandic</li>
<!-- li data-l10n="TXT_IS_020">a good tool to read news or other texts</li -->
<!-- li data-l10n="TXT_IS_030">a good helper if you're learning Greenlandic</li -->
<li data-l10n="TXT_IS_040">the first release (alpha version) of the service</li>
</ul>

Expand All @@ -33,7 +33,7 @@
<ul>
<li data-l10n="TXT_ISNOT_010">a human; the service does not understand spelling or grammatical errors, and these will greatly impair the quality of the translation</li>
<li data-l10n="TXT_ISNOT_020">a dictionary; this service expects whole sentences, not fragments or single words</li>
<li data-l10n="TXT_ISNOT_030">finished; there is still a lot to do, and we know how to progress from here</li>
<!-- li data-l10n="TXT_ISNOT_030">finished; there is still a lot to do, and we know how to progress from here</li -->
</ul>
</div>
<div class="modal-footer text-center">
Expand Down Expand Up @@ -108,7 +108,7 @@
<div class="row my-5 justify-content-center">
<div class="col-lg-9 col-md-9 col-sm-12">
<h5 data-l10n="HDR_PRE2023_INFO" id="info">Rule-based Machine Translation</h5>
<p data-l10n="TXT_PRE2023_010">Started in 2018, this project aimed at making a rule-based machine translation engine between Greenlandic and Danish. Until mid-2023, this was the only viable method for handling a polysynthetic language with sparse bilingual corpora, but advances in machine learning has since allowed for trained machine translation models that perform equally well.</p>
<p data-l10n="TXT_PRE2023_010">Started in 2017, this project aimed at making a rule-based machine translation engine between Greenlandic and Danish. Until mid-2023, this was the only viable method for handling a polysynthetic language with sparse bilingual corpora, but advances in machine learning has since allowed for trained machine translation models that perform equally well.</p>
<p data-l10n="TXT_PRE2023_020">During the development of this project, we greatly improved the Greenlandic language analysis engine, which has many other uses - one of which is seen in the <a href="./hybrid">hybrid artificial intelligence</a> engine.</p>
</div>
</div>
Expand Down
Loading

0 comments on commit 43b3d31

Please sign in to comment.