Skip to content

Commit

Permalink
fix: proper nj transliteration
Browse files Browse the repository at this point in the history
  • Loading branch information
noomorph committed Nov 23, 2023
1 parent 585959b commit a84447f
Show file tree
Hide file tree
Showing 11 changed files with 1,794 additions and 59 deletions.
2 changes: 1 addition & 1 deletion .nvmrc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
lts/fermium
lts/hydrogen
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
"name": "@interslavic/utils",
"version": "0.0.0",
"description": "Utilities for declension, conjugation, transliteration, etc.",
"type": "commonjs",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"files": [
"dist",
"src",
Expand Down Expand Up @@ -45,7 +47,7 @@
"devDependencies": {
"@commitlint/cli": "^11.0.0",
"@commitlint/config-conventional": "^11.0.0",
"@types/jest": "^28.1.8",
"@types/jest": "^29.0.0",
"@types/js-yaml": "^4.0.5",
"@types/lodash": "^4.14.168",
"@typescript-eslint/eslint-plugin": "^5.59.8",
Expand Down
54 changes: 54 additions & 0 deletions scripts/generate-nj-suite.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/usr/bin/env node

import fs from 'node:fs';
import utils from '../dist/index.js';

function* extractWords(str) {
// include letters and combining marks
const regex = /([\p{L}\p{M}]+)/gu;
let match;

while ((match = regex.exec(str)) !== null) {
yield match[1];
}
}

function* extractWordsFromFile(filePath) {
const raw = fs.readFileSync(filePath, 'utf8');
yield* extractWords(raw);
}

function* allWords() {
yield* extractWordsFromFile('src/adjective/testCases.json');
yield* extractWordsFromFile('src/noun/__snapshots__/declensionNoun.test.ts.snap');
yield* extractWordsFromFile('src/numeral/testCases.json');
yield* extractWordsFromFile('src/pronoun/testCases.json');
yield* extractWordsFromFile('src/verb/testCases.json');
}

function endsWithNj(word) {
return word.endsWith('nja')
|| word.endsWith('njah')
|| word.endsWith('njam')
|| word.endsWith('njami')
|| word.endsWith('nje')
|| word.endsWith('njem')
|| word.endsWith('nju');
}

function buildExceptionList() {
const set = new Set();
for (const word of allWords()) {
if (endsWithNj(word)) {
set.add(utils.transliterate(word.toLowerCase(), 'art-Latn-x-interslv'));
}
}
return [...set].sort();
}

function toTrieToken(word) {
return '%' + word.split('').reverse().join('') + '%';
}

console.log(buildExceptionList().map(toTrieToken).join(' '));

84 changes: 42 additions & 42 deletions src/transliterate/__snapshots__/index.test.ts.snap

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions src/transliterate/index.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ const latin = `\
Na vȯzvyšenosti ovca, ktora ne iměla vȯlnų, uviděla konjev. Pŕvy tęgal tęžky voz, vtory nosil veliko brěmę, tretji brzo vozil mųža.
Ovca rěkla konjam: «Boli mně sŕdce, kȯgda viđų, kako člověk vladaje konjami.»
Konji rěkli: «Slušaj, ovco, nam boli sŕdce, kȯgda vidimo ovo: mųž, gospodaŕ, bere tvojų vȯlnų, da by iměl dlja sebe teplo paĺto. A ovca jest bez vȯlny.»
Uslyšavši to, ovca izběgla v råvninų. | Odjezd. T́ma.`;
Uslyšavši to, ovca izběgla v råvninų. | Odjezd. T́ma, i korenje revenja počęli råsteńje.`;

const cyrillic = `\
На возвышености овца, ктора не имѣла вълнѫ, увидѣла коњев. Прьвы тѧгал тѧжкы воз, вторы носил велико брємѧ, третји брзо возил мѫжа.
Овца рѣкла коням: «Боли мнє срьдце, къгда виџу, како чловѣк владаје коньами.»
Конји рѣкли: «Слушай, овцо, нам боли срьдце, къгда видимо ово: мѫж, господарь, бере твоѭ вълнѫ, да бы имѣл дља себе тепло пальто. А овца ѥсть без вълны.»
Услышавши то, овца избѣгла в рӑвнинѹ. | Одјезд. Тьма.`;
Услышавши то, овца избѣгла в рӑвнинѹ. | Одјезд. Тьма, и корење ревења почѧли рӑстеньје.`;

describe('transliterate to', () => {
describe.each([
Expand Down
3 changes: 3 additions & 0 deletions src/transliterate/nje/Dict.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export interface Dict {
[key: string]: Dict | number;
}
Loading

0 comments on commit a84447f

Please sign in to comment.