Skip to content

Commit

Permalink
L10N handler patch.
Browse files Browse the repository at this point in the history
Changelog excerpt:
- Updated the L10N handler's language rules.
  • Loading branch information
Maikuolan committed Feb 12, 2025
1 parent 16f390e commit d290cfe
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 14 deletions.
4 changes: 3 additions & 1 deletion Changelog.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ found at:

=== Changes made since last versioned release ===

- [2025.05.02; Maikuolan]: Added some limited support to the number formatter
- [2025.02.05; Maikuolan]: Added some limited support to the number formatter
class for unformatting a formatted number.

- [2025.02.12; Maikuolan]: Updated the L10N handler's language rules.

=== Version/Release 2.12.3 ===
PATCH RELEASE.

Expand Down
14 changes: 8 additions & 6 deletions _docs/L10N.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,19 +230,21 @@ Additionally, as you might've noticed in the above example, the fallback L10N ar

#### What rules to use for what language?

*The information listed in the table below is generally based upon [Unicode's CLDR page on Language Plural Rules](https://www.unicode.org/cldr/charts/46/supplemental/language_plural_rules.html) (which also serves as the general basis for the rules for [grammatical number](https://en.wikipedia.org/wiki/Grammatical_number) supported by the class). Information based upon other sources will be marked accordingly. If any of the listed information is wrong, erroneous, or incomplete, any corrections, additions, etc that you can think of would be invited and welcome (please create a pull request, or create an [issue](https://github.com/Maikuolan/Common/issues) if creating a pull request isn't possible). Please also be aware that I am NOT a professional linguist! If you ask me for the correct rules to use for a particular language, I'll only be able to answer if I'm able to find a reliable source somewhere online to help determine that information.*
*The information listed in the table below is GENERALLY based upon [Unicode's CLDR page on Language Plural Rules](https://www.unicode.org/cldr/charts/47/supplemental/language_plural_rules.html) (which also serves as the general basis for the rules for [grammatical number](https://en.wikipedia.org/wiki/Grammatical_number) supported by the class). Information based upon other sources will be marked accordingly. If any of the listed information is wrong, erroneous, or incomplete, any corrections, additions, or changes that you can think of would be invited and welcome (please create a pull request, or create an [issue](https://github.com/Maikuolan/Common/issues) if creating a pull request isn't possible). Please also be aware that I am NOT a professional linguist! If you ask me for the correct rules to use for a particular language, I'll only be able to answer if I'm able to find a reliable source somewhere online for that data.*

*†1: Language isn't listed on Unicode's CLDR page, but the required information for it can be found elsewhere (if a single, particular information source is the sole or primarily used information source, it will be linked next to the language, where this mark occurs).*
*†1: Unicode's CLDR page doesn't provide any data for the given language, but the relevant data can be found elsewhere (the source of that data will be linked or cited where possible).*

*†2: I (the author of this class) have found convincing evidence/data which contradicts the data provided by Unicode's CLDR page for the given language, and so, the data listed here will differ from that provided by Unicode's CLDR page.*

Language | `IntegerRule` | `FractionRule` | Notes
:--|:--|:--|:--
`********************************` | `********` | `********` | `********`
Afrikaans<br />Albanian (Shqipe)<br />Aragonese<br />Asturian (Asturianu)<br />Asu<br />Azerbaijani (Azərbaycan)<br />Balochi (بلۏچی)<br />Basque (Euskara)<br />Bemba<br />Bena<br />Bodo (बड़ो)<br />Bulgarian (Български)<br />Catalan (Català)<br />Chechen<br />Cherokee (ᏣᎳᎩ)<br />Chiga<br />Divehi<br />Dutch (Nederlandse)<br />English<br />Esperanto<br />Estonian (Eesti keel)<br />European Portuguese (Português)<br />Ewe (Eʋegbe)<br />Faroese (Føroyskt)<br />Finnish (Suomi)<br />Friulian<br />Galician (Galego)<br />Ganda (LùGáànda)<br />Georgian (ქართული)<br />German (Deutsch)<br />Greek (Ελληνικά)<br />Greenlandic (Kalaallisut)<br />Hausa (حَوْسَ)<br />Hawaiian (ʻōlelo Hawaiʻi)<br />Hungarian (Magyar)<br />Ido<br />Interlingua<br />Italian (Italiano)<br />Jju<br />Kako<br />Kashmiri (कॉशुर, كٲشُر)<br />Kazakh (Қазақ тілі)<br />Kurdish (Kurdî)<br />Kyrgyz (Кыргыз тили)<br />Ladin<br />Ligurian<br />Luxembourgish (Lëtzebuergesch)<br />Machame<br />Malayalam (മലയാളം)<br />Marathi (मराठी)<br />Masai<br />Maori (Māori) *[†1](https://en.wikipedia.org/wiki/M%C4%81ori_language)*<br />Metaʼ<br />Mongolian (Монгол)<br />Nahuatl (Nāhuatl)<br />Ndebele<br />Nepali (नेपाली)<br />Ngiemboon<br />Ngomba<br />Norwegian (Norsk)<br />Norwegian Bokmål<br />Norwegian Nynorsk<br />Nyanja<br />Nyankole<br />Odia (ଓଡ଼ିଆ)<br />Oromo (ኦሮሞ፞)<br />Ossetic<br />Papiamento (Papiamentu)<br />Pashto (پښتو)<br />Romansh (Rumantsch)<br />Rombo<br />Rwa<br />Saho<br />Samburu<br />Samoan<br />Sardinian (Limba Sarda)<br />Scots *[†1](http://www.scots-online.org/grammar/numbers.asp)*<br />Sena<br />Shambala<br />Shona<br />Sicilian (Sicilianu)<br />Sindarin *[†1](https://en.wikipedia.org/wiki/Sindarin)*<br />Sindhi (سنڌي)<br />Soga<br />Somali (Soomaaliga)<br />Southern Sotho (Sesotho)<br />Spanish (Español)<br />Swahili (Kiswahili)<br />Swati<br />Swedish (Svenska)<br />Swiss German<br />Syriac (ܠܫܢܐ ܣܘܪܝܝܐ)<br />Tamil (தமிழ்)<br />Telugu (తెలుగు)<br />Teso<br />Tigre (ትግረ, ትግሬ)<br />Tsonga (xiTsonga)<br />Tswana (Setswana)<br />Turkish (Türkçe)<br />Turkmen (Түркmенче)<br />Tyap<br />Urdu (‏اردو‏)<br />Uyghur (ئۇيغۇرچە, Уйғурчә)<br />Uzbek (O'zbek)<br />Venda (tshiVenḓa)<br />Volapük<br />Vunjo<br />Walser<br />Western Frisian (Frysk)<br />Xhosa (isiXhosa)<br />Yiddish (ייִדיש) | `int2Type4` | `int1`
Afrikaans<br />Albanian (Shqipe)<br />Aragonese<br />Asturian (Asturianu)<br />Asu<br />Azerbaijani (Azərbaycan)<br />Balochi (بلۏچی)<br />Basque (Euskara)<br />Bemba<br />Bena<br />Bodo (बड़ो)<br />Bulgarian (Български)<br />Catalan (Català)<br />Chechen<br />Cherokee (ᏣᎳᎩ)<br />Chiga<br />Divehi<br />Dutch (Nederlandse)<br />English<br />Esperanto<br />Estonian (Eesti keel)<br />European Portuguese (Português)<br />Ewe (Eʋegbe)<br />Faroese (Føroyskt)<br />Finnish (Suomi)<br />Friulian<br />Galician (Galego)<br />Ganda (LùGáànda)<br />Georgian (ქართული)<br />German (Deutsch)<br />Greek (Ελληνικά)<br />Greenlandic (Kalaallisut)<br />Hausa (حَوْسَ)<br />Hawaiian (ʻōlelo Hawaiʻi)<br />Hungarian (Magyar)<br />Ido<br />Interlingua<br />Italian (Italiano)<br />Jju<br />Kako<br />Kashmiri (कॉशुर, كٲشُر)<br />Kazakh (Қазақ тілі)<br />Kituba *[†1](https://en.wikipedia.org/wiki/Kituba_language)*<br />Kongo/Kikongo *†1*<br />Kurdish (Kurdî)<br />Kyrgyz (Кыргыз тили)<br />Ladin<br />Latgalian (Latgalīšu) *†1*<br />Latvian (Latviešu) *†2*<br />Ligurian (Ligure)<br />Luxembourgish (Lëtzebuergesch)<br />Machame<br />Malayalam (മലയാളം)<br />Marathi (मराठी)<br />Masai<br />Maori (Māori) *[†1](https://en.wikipedia.org/wiki/M%C4%81ori_language)*<br />Metaʼ<br />Mongolian (Монгол)<br />Nahuatl (Nāhuatl)<br />Ndebele<br />Nepali (नेपाली)<br />Ngiemboon<br />Ngomba<br />Norwegian (Norsk)<br />Norwegian Bokmål<br />Norwegian Nynorsk<br />Nyanja<br />Nyankole<br />Odia (ଓଡ଼ିଆ)<br />Oromo (ኦሮሞ፞)<br />Ossetic<br />Papiamento (Papiamentu)<br />Pashto (پښتو)<br />Romansh (Rumantsch)<br />Rombo<br />Rwa<br />Saho<br />Samburu<br />Samoan<br />Sardinian (Limba Sarda)<br />Scots *[†1](http://www.scots-online.org/grammar/numbers.asp)*<br />Sena<br />Shambala<br />Shona<br />Sicilian (Sicilianu)<br />Sindarin *[†1](https://en.wikipedia.org/wiki/Sindarin)*<br />Sindhi (سنڌي)<br />Soga<br />Somali (Soomaaliga)<br />Southern Sotho (Sesotho)<br />Spanish (Español)<br />Swahili (Kiswahili)<br />Swati<br />Swedish (Svenska)<br />Swiss German<br />Syriac (ܠܫܢܐ ܣܘܪܝܝܐ)<br />Tamil (தமிழ்)<br />Telugu (తెలుగు)<br />Teso<br />Tigre (ትግረ, ትግሬ)<br />Tsonga (xiTsonga)<br />Tswana (Setswana)<br />Turkish (Türkçe)<br />Turkmen (Түркmенче)<br />Tyap<br />Urdu (‏اردو‏)<br />Uyghur (ئۇيغۇرچە, Уйғурчә)<br />Uzbek (O'zbek)<br />Venda (tshiVenḓa)<br />Volapük<br />Vunjo<br />Walser<br />Western Frisian (Frysk)<br />Xhosa (isiXhosa)<br />Yiddish (ייִדיש) | `int2Type4` | `int1`
Akan<br />Bihari<br />Gun<br />Klingon (tlhIngan Hol,  ) *[†1](https://en.wikibooks.org/wiki/Klingon/Grammar/Plurals)*<br />Lingala (Lingála)<br />Malagasy<br />Northern Sotho (Sesotho)<br />Punjabi (ਪੰਜਾਬੀ) *‡1*<br />Sinhala (සිංහල)<br />Tigrinya (ትግርኛ)<br />Walloon (Walon) | `int2Type3` | `int1` | *‡1: Classification includes (groups together with): Changvi, Chenavari, Dhani, Doabi, Hindko, Jafri, Jangli, Jhangochi, Khetrani, Lahnda, Majhi, Malwai, Pahari-Potowari, Panjistani, Pothohari, Puadhi, Rachnavi, Saraiki, Shahpuri.*
Amharic (አማርኛ)<br />Assamese (অসমীয়া)<br />Bangla/Bengali (বাংলা)<br />Dogri (𑠖𑠵𑠌𑠤𑠮)<br />Gujarati (ગુજરાતી)<br />Hindi (हिंदी)<br />Kannada (ಕನ್ನಡ)<br />Nigerian Pidgin<br />Persian/Farsi (فارسی)<br />Zulu (isiZulu) | `int2Type3` | `fraction2Type2`
Arabic (<code dir="rtl">العربية</code>) *‡1* | `int6Type1` | `int1` | *‡1: CLDR's information suggests 6 distinct grammatical numbers used, but I haven't been able to successfully replicate this via online translators or dictionaries in most cases, so I'm not entirely sure about it.*
Armenian (հայերեն)<br />Bhojpuri (भोजपुरी)<br />Brazilian Portuguese (Portugues do Brasil)<br />French (Français)<br />Fulah<br />Kabyle (ثاقبايليث) | `int2Type3` | `fraction2Type1`
Bambara<br />Bhutanese/Dzongkha (རྫོང་ཁ)<br />Burmese (ျမန္မာဘာသာ)<br />Chinese (中文) *‡1*<br />Hmong Njua<br />Igbo<br />Indonesian (Bahasa Indonesia)<br />Japanese (日本語)<br />Javanese (Jawa)<br />Kabuverdianu<br />Khmer (ភាសាខ្មែរ)<br />Korean (한국어)<br />Koyraboro Senni<br />Lakota (Lakȟótiyapi)<br />Lao (ພາສາລາວ)<br />Lojban<br />Makonde<br />Malay (Bahasa Melayu)<br />N’Ko (ߒߞߏ)<br />Osage<br />Sakha<br />Sango<br />Sichuan Yi (ꆈꌠꉙ)<br />Thai (ไทย)<br />Tibetan (བོད་སྐད)<br />Toki Pona *[†1](http://tokipona.net/tp/janpije/originallessons-tp3.php)*<br />Tongan (Faka-Tonga)<br />Vietnamese (Tiếng Việt)<br />Wolof (Wollof)<br />Yoruba (Yorùbá) | `int1` | `int1` | Although `int1`+`int1` could *imply* that there aren't plural forms for a particular language, it should be noted that in most cases, plurality can be inferred by context, indicated by [specificity](https://en.wikipedia.org/wiki/Specificity_(linguistics)), [reduplication](https://en.wikipedia.org/wiki/Reduplication), or otherwise determined by some other means. It doesn't mean that there aren't plurals. Rather, it simply means that for those languages, it doesn't mean anything for this particular class.<br /><br />*‡1: Whether simplified (傳統) or traditional (简体), Cantonese (广东话) or Mandarin (普通话), or whatever else, pluralisation rules are the same (AFAICT).*
Bambara<br />Bhutanese/Dzongkha (རྫོང་ཁ)<br />Burmese (ျမန္မာဘာသာ)<br />Chinese (中文) *‡1*<br />Hmong Njua<br />Igbo<br />Indonesian (Bahasa Indonesia)<br />Japanese (日本語)<br />Javanese (Jawa)<br />Kabuverdianu<br />Khmer (ភាសាខ្មែរ)<br />Korean (한국어)<br />Koyraboro Senni<br />Lakota (Lakȟótiyapi)<br />Lao (ພາສາລາວ)<br />Lojban<br />Makonde<br />Malay (Bahasa Melayu)<br />N’Ko (ߒߞߏ)<br />Osage<br />Sakha<br />Sango<br />Sichuan Yi (ꆈꌠꉙ)<br />Thai (ไทย)<br />Tibetan (བོད་སྐད)<br />Toki Pona *[†1](http://tokipona.net/tp/janpije/originallessons-tp3.php)*<br />Tongan (Faka-Tonga)<br />Vietnamese (Tiếng Việt)<br />Wolof (Wollof)<br />Yoruba (Yorùbá) | `int1` | `int1` | Although `int1`+`int1` could *imply* that there aren't plural forms for a particular language, it should be noted that in most cases, plurality can be inferred by context, indicated by [specificity](https://en.wikipedia.org/wiki/Specificity_(linguistics)), [reduplication](https://en.wikipedia.org/wiki/Reduplication), or otherwise determined by some other means. It doesn't mean that there aren't plurals; just that it wouldn't affect how the class should be used.<br /><br />*‡1: Whether simplified (傳統) or traditional (简体), Cantonese (广东话) or Mandarin (普通话), or whatever else, pluralisation rules are the same (AFAICT).*
Belarusian (Беларуская мова)<br />Bosnian (Bosanski)<br />Croatian (Hrvatski)<br />Russian (Русский)<br />Serbian (Српски)<br />Serbo-Croatian<br />Ukrainian (Українська) | `int3Type4` | `int1`
Breton (Brezhoneg) | `int4Type3` | `int1`
Anii<br />Colognian | `int3Type2` | `int1`
Expand All @@ -254,7 +256,7 @@ Hebrew (עברית) | `int3Type3` | `fraction2Type2`
Icelandic (Íslenska)<br />Macedonian (Македонски) | `int2Type2` | `int1`
Irish (Gaeilge) | `int5Type1` | `int1`
Langi | `int3Type2` | `fraction2Type1`
Latvian (Latviešu)<br />Prussian | `int3Type1` | `int1`
Prussian | `int3Type1` | `int1`
Lithuanian (Lietuvių) | `int3Type6` | `int1`
Lower Sorbian (Dolnoserbski)<br />Slovenian (Slovenščina)<br />Upper Sorbian (Hornjoserbsce) | `int4Type4` | `int1`
Maltese (Malti) | `int5Type2` | `int1`
Expand Down Expand Up @@ -480,4 +482,4 @@ This means, that in theory, you could have an unlimited number of languages as f
---


Last Updated: 22 June 2024 (2024.06.22).
Last Updated: 12 February 2025 (2025.02.12).
24 changes: 17 additions & 7 deletions src/L10N.php
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?php
/**
* L10N handler (last modified: 2024.08.08).
* L10N handler (last modified: 2025.02.12).
*
* This file is a part of the "common classes package", utilised by a number of
* packages and projects, including CIDRAM and phpMussel.
Expand Down Expand Up @@ -303,7 +303,7 @@ private function int2Type4(int $Int): int
}

/**
* Three grammatical numbers, type one. For e.g., Latvian, Prussian.
* Three grammatical numbers, type one. For e.g., Prussian.
*
* @param int $Int The plurality/number of things.
* @return int 0: Singular form. 1: Other form. 2: Zero form.
Expand Down Expand Up @@ -749,10 +749,11 @@ private function fraction2Type2(float $Fraction): int

/**
* Determine an appropriate integer rule to use based upon the specified
* ISO 639-1/639-2 language code.
* ISO 639-1/639-2/639-3 language code (two-digit code preferred wherever
* available).
* @link https://www.loc.gov/standards/iso639-2/php/code_list.php
* @link https://cldr.unicode.org/index/cldr-spec/plural-rules
* @link https://www.unicode.org/cldr/charts/46/supplemental/language_plural_rules.html
* @link https://www.unicode.org/cldr/charts/47/supplemental/language_plural_rules.html
*
* @param string $Code An ISO 639-1/639-2 language code.
* @return string An appropriate integer rule to use.
Expand Down Expand Up @@ -856,19 +857,24 @@ public function getIntegerRule(string $Code): string
'ka',
'kaj',
'kcg',
'kg',
'kk',
'kkj',
'kl',
'ks',
'ksb',
'ktu',
'ku',
'ky',
'lb',
'lg',
'lij',
'ltg',
'lv',
'mas',
'mgo',
'mi',
'mkw',
'ml',
'mn',
'mr',
Expand Down Expand Up @@ -926,13 +932,13 @@ public function getIntegerRule(string $Code): string
'wae',
'xh',
'xog',
'yi'
'yi',
'yom'
], true)) {
return 'int2Type4';
}

if (in_array($Code, [
'lv',
'prg'
], true)) {
return 'int3Type1';
Expand Down Expand Up @@ -1073,7 +1079,11 @@ public function getIntegerRule(string $Code): string

/**
* Determine an appropriate fraction rule to use based upon the specified
* ISO 639-1/639-2 language code.
* ISO 639-1/639-2/639-3 language code (two-digit code preferred wherever
* available).
* @link https://www.loc.gov/standards/iso639-2/php/code_list.php
* @link https://cldr.unicode.org/index/cldr-spec/plural-rules
* @link https://www.unicode.org/cldr/charts/47/supplemental/language_plural_rules.html
*
* @param string $Code An ISO 639-1/639-2 language code.
* @return string An appropriate fraction rule to use.
Expand Down

0 comments on commit d290cfe

Please sign in to comment.