You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 10, 2024. It is now read-only.
Obvious care has been taken to make sure the pronunciation of ه (hā-ye do-češm) has been differentiated between its pronunciation as /h/ (word-initially and -medially) and /e/ (word-finally) as in:
ماهه m A h e
Examples like that above are useful to make sure when the grapheme should be pronounced as [h] or [e]. However, there does not seem to be a distinction between the [h] pronunciation of ه (hā-ye do-češm) and the [h] pronunciation of ح (ḥâ-ye ḥotti / ḥâ-ye jimi) anywhere in the dictionary. Consider the following examples:
[1] حادثه h A d e s e
[2] حوزه h o z e
[3] صحیح s a h i h
[4] آنها A n h A
If this dictionary were to be used in its reverse form, [1] could be reconstituted from "h A d e s e" into either "هادِثه" or "حادِثه". This is certainly a niche issue, but I am trying to diacritize non-diacritized text, and so in order to re-constitute the original text I have with included vowels given your dictionary's scheme, I need to know which Farsi character to convert back to in the end. There are no instances in either dictionary where ḥâ-ye jimi and hā-ye do-češm (pronounced as [h]) appear in the same word so a simple string replace should do it there. I suggest replacing [h] with [H] for
ḥâ-ye jimi to make this dictionary reversible.
It may well be that there are no words in Farsi which contain both ح and ه, but this would solve the edge use-case I describe here.
Thanks for your work!
The text was updated successfully, but these errors were encountered:
If this dictionary were to be used in its reverse form
We use this dictionary to predict the pronunciation of a given word. If I am not mistaken you are looking for a reverse function. We have many homophone words in Persian that have different meaning but same pronunciation. For example:
h a y A t
can be written in these forms:
حیاط (courtyard)
حیات (life)
In this repository we focused on pronunciation. However there is another project that try to latinized Persian: Alefbaye 2om You may also check it out.
شاد و سلامت باشید
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Obvious care has been taken to make sure the pronunciation of ه (hā-ye do-češm) has been differentiated between its pronunciation as /h/ (word-initially and -medially) and /e/ (word-finally) as in:
Examples like that above are useful to make sure when the grapheme should be pronounced as [h] or [e]. However, there does not seem to be a distinction between the [h] pronunciation of ه (hā-ye do-češm) and the [h] pronunciation of ح (ḥâ-ye ḥotti / ḥâ-ye jimi) anywhere in the dictionary. Consider the following examples:
If this dictionary were to be used in its reverse form, [1] could be reconstituted from "h A d e s e" into either "هادِثه" or "حادِثه". This is certainly a niche issue, but I am trying to diacritize non-diacritized text, and so in order to re-constitute the original text I have with included vowels given your dictionary's scheme, I need to know which Farsi character to convert back to in the end. There are no instances in either dictionary where ḥâ-ye jimi and hā-ye do-češm (pronounced as [h]) appear in the same word so a simple string replace should do it there. I suggest replacing [h] with [H] for
ḥâ-ye jimi to make this dictionary reversible.
It may well be that there are no words in Farsi which contain both ح and ه, but this would solve the edge use-case I describe here.
Thanks for your work!
The text was updated successfully, but these errors were encountered: