You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the design document resulting from the translations improvement efforts made in 20241210 and 20250114 telecons.
Aim of the Document
This document aims to summarize the discussion made on 14 Jan 2025, and put forward a design towards implementation of a preliminary translation/internationalization (i18n from now on) pipeline for InvenioRDM to accelerate deployment.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119RFC8174 when, and only when, they appear in all capitals, as shown here.
Current Shortcomings
InvenioRDM contains two big pieces forming its front-end. Python (Flask) and JavaScript (React). On top of this two language foundation, there are numerous modules present which make parts of the InvenioRDM functionality. As a result, while InvenioRDM looks and works like a coherent software package, it's composed from many small, sometimes independent modules, making it a complex piece of software.
There are various ways to translate these packages and (user) interfaces to different languages to enable i18n, which is required by quite a few installations and institutions. Currently, there are numerous ways to make this possible, however the methods are time consuming and not practical. In its current state, translation of InvenioRDM boils down to translation of strings in Transifex and injection of these translations to every package, mostly manually, before releasing a new version of the software.
To counter this shortcoming, a new feature called Translation Bundle concept was proposed. In short, this functionality allows a drop-in package to translate the whole InvenioRDM deployment, which can be updated by replacing the package and restarting the services. However, currently this feature supports the Python parts, and JavaScript parts require an extensive work to integrate into this model.
Proposed Solution
To accelerate the development of i18n solutions for InvenioRDM, an intermediate model for translating the foundation and modules of the InvenioRDM is proposed in the remote meetings (telecons for short) on 20241210 and 20250114. This model is what's detailed and discussed in this document.
Idea Overview
Current proposal envisions some changes to the so-called "assembly" phase of an InvenioRDM installation. In short, in consists of two phases:
Collection of all translatable strings from JavaScript packages into a single file.
Distribution of the translations to their relevant packages before installation.
This workflow requires changes to the invenio-cli command to allow collection, distribution and merging of the translation strings. Initial work is done on so-called Pull #380, and further work is expected to be done on the same pull request, as well.
Detailed Description
In the current state of the improvement, it's envisioned that the translatable strings are collected from the codebase with invenio-cli command, where it uses the core implementation which will be present in invenio-i18n command. This process will visit every module and collate the strings to a single JSON file while prefixing them with the package name they're coming from to prevent collisions and easy distribution.
These strings then will be translated independently, and when the process is done, the same JSON file will be used with the invenio-i18n or invenio-cli tool to redistribute the strings back to the packages which they belong.
After this replacement/injection has been done, the assembly of the InvenioRDM can continue and the software can be installed.
Workflow Considerations
It's evident that the collection of strings and re-introduction of them should be done asynchronously. Hence, the workflow should be conductive to extraction and re-introduction of the strings without executing a whole installation workflow.
It's considered that the introduction of the strings will be done and virtual environment level, and re-building such environment will remove the changes made to the strings and installation source in general. It's assumed that the tools around the InvenioRDM installation is conductive to such workflows in general, so the necessary changes will be minimal in nature.
It's envisioned that the changes will be introduced before the so-called final-assembly of the modules and packages to form the final InvenioRDM installation artifact. Namely, before webpack step.
Every package implementing semantic-ui already has translation files located in semantic-ui/translations/*/messages/*/translations.json, so the tool will target these files for collection and re-distribution.
Implementation Details
The current implementation proposal has two main parts: invenio-i18n module, and invenio-cli tool. The most of the logic should be implemented in the former to prevent a hard-requirement in invenio-cli tool, yet the tool itself should provide the necessary interface to automate/ease the process as much as possible.
invenio-i18n
invenio-i18n is a Python library mostly concerned with translation of InvenioRDM as a whole. The library has a documentation page which details its capabilities. From the documentation:
This module provide features for loading and merging message catalogs. It is built on top of Flask-BabelEx and most external modules should just depend on Flask-BabelEx instead of Invenio-I18N. Only applications in need of loading and merging many message catalogs should integrate this module.
However, the library/tool also can traverse React libraries, and extract react-i18next translation catalogs (JSON files).
The first step in invenio-i18n would be adding the ability to extract trasnlation catalogs present in semantic-ui implementations, which are stored as JSON files. To do so, extract_messages_js flow inside invenio-i18n should be implemented.
Next step is to add functionality in invenio-i8n package to read this big JSON file, divide into required JSON files, and replace the catalogs inside every module. This is where keying every string with its respective package name is required. This can be implemented as a new function with name import_messages_js, or similar.
However, adding functions solely to invenio-i18n is not enough, since the functionality should be available during the testing and deployment process. Hence, this requirement brings us to invenio-cli.
invenio-cli
invenio-cli is the main tool for the InvenioRDM software stack. It enables users to manage their InvenioRDM installation from start to finish, including translations. As a result, invenio-cli has a translations command which handles all the steps required for translations from extraction to updating dictionaries used by packages.
As a result, invenio-cli translations command needs to be updated to interface with the new and updated functionality inside invenio-i18n module.
The tool also manages the asset management function, under assets command. Currently looks like it's missing an option which only downloads the assets without building/compiling them. As a result, a new command under assets command may need to be implemented, or an existing one would be updated.
Concise Task List
Improve invenio-i18n's extract_messages_js method to extract all JSON files and merge into a single, big JSON file.
Improve invenio-i18n to add a (hypothetical) import_messages_js method to re-distribute the modified, big JSON file to their respective modules.
Improve invenio-cli's assets command to download assets without building/optimizing them to allow translation extraction/injection (if required).
Improve invenio-cli's translations command to allow extraction of Semantic-UI's JSON files (merging will be handled by invenio-i18n module).
Improve invenio-cli's translations command to reintroduce translated JSON files to appropriate packages (division will be handled by invenio-i18n module).
Aim of the Document
This document aims to summarize the discussion made on 14 Jan 2025, and put forward a design towards implementation of a preliminary translation/internationalization (i18n from now on) pipeline for InvenioRDM to accelerate deployment.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119 RFC8174 when, and only when, they appear in all capitals, as shown here.
Current Shortcomings
InvenioRDM contains two big pieces forming its front-end. Python (Flask) and JavaScript (React). On top of this two language foundation, there are numerous modules present which make parts of the InvenioRDM functionality. As a result, while InvenioRDM looks and works like a coherent software package, it's composed from many small, sometimes independent modules, making it a complex piece of software.
There are various ways to translate these packages and (user) interfaces to different languages to enable i18n, which is required by quite a few installations and institutions. Currently, there are numerous ways to make this possible, however the methods are time consuming and not practical. In its current state, translation of InvenioRDM boils down to translation of strings in Transifex and injection of these translations to every package, mostly manually, before releasing a new version of the software.
To counter this shortcoming, a new feature called Translation Bundle concept was proposed. In short, this functionality allows a drop-in package to translate the whole InvenioRDM deployment, which can be updated by replacing the package and restarting the services. However, currently this feature supports the Python parts, and JavaScript parts require an extensive work to integrate into this model.
Proposed Solution
To accelerate the development of i18n solutions for InvenioRDM, an intermediate model for translating the foundation and modules of the InvenioRDM is proposed in the remote meetings (telecons for short) on 20241210 and 20250114. This model is what's detailed and discussed in this document.
Idea Overview
Current proposal envisions some changes to the so-called "assembly" phase of an InvenioRDM installation. In short, in consists of two phases:
This workflow requires changes to the
invenio-cli
command to allow collection, distribution and merging of the translation strings. Initial work is done on so-called Pull #380, and further work is expected to be done on the same pull request, as well.Detailed Description
In the current state of the improvement, it's envisioned that the translatable strings are collected from the codebase with
invenio-cli
command, where it uses the core implementation which will be present ininvenio-i18n
command. This process will visit every module and collate the strings to a singleJSON
file while prefixing them with the package name they're coming from to prevent collisions and easy distribution.These strings then will be translated independently, and when the process is done, the same
JSON
file will be used with theinvenio-i18n
orinvenio-cli
tool to redistribute the strings back to the packages which they belong.After this replacement/injection has been done, the assembly of the InvenioRDM can continue and the software can be installed.
Workflow Considerations
It's evident that the collection of strings and re-introduction of them should be done asynchronously. Hence, the workflow should be conductive to extraction and re-introduction of the strings without executing a whole installation workflow.
It's considered that the introduction of the strings will be done and virtual environment level, and re-building such environment will remove the changes made to the strings and installation source in general. It's assumed that the tools around the InvenioRDM installation is conductive to such workflows in general, so the necessary changes will be minimal in nature.
It's envisioned that the changes will be introduced before the so-called final-assembly of the modules and packages to form the final InvenioRDM installation artifact. Namely, before
webpack
step.Every package implementing
semantic-ui
already has translation files located insemantic-ui/translations/*/messages/*/translations.json
, so the tool will target these files for collection and re-distribution.Implementation Details
The current implementation proposal has two main parts:
invenio-i18n
module, andinvenio-cli
tool. The most of the logic should be implemented in the former to prevent a hard-requirement ininvenio-cli
tool, yet the tool itself should provide the necessary interface to automate/ease the process as much as possible.invenio-i18n
invenio-i18n
is a Python library mostly concerned with translation of InvenioRDM as a whole. The library has a documentation page which details its capabilities. From the documentation:However, the library/tool also can traverse React libraries, and extract
react-i18next
translation catalogs (JSON
files).The first step in
invenio-i18n
would be adding the ability to extract trasnlation catalogs present insemantic-ui
implementations, which are stored as JSON files. To do so,extract_messages_js
flow insideinvenio-i18n
should be implemented.Next step is to add functionality in
invenio-i8n
package to read this big JSON file, divide into required JSON files, and replace the catalogs inside every module. This is where keying every string with its respective package name is required. This can be implemented as a new function with nameimport_messages_js
, or similar.However, adding functions solely to
invenio-i18n
is not enough, since the functionality should be available during the testing and deployment process. Hence, this requirement brings us toinvenio-cli
.invenio-cli
invenio-cli
is the main tool for the InvenioRDM software stack. It enables users to manage their InvenioRDM installation from start to finish, including translations. As a result,invenio-cli
has atranslations
command which handles all the steps required for translations from extraction to updating dictionaries used by packages.As a result,
invenio-cli
translations command needs to be updated to interface with the new and updated functionality insideinvenio-i18n
module.The tool also manages the asset management function, under
assets
command. Currently looks like it's missing an option which only downloads the assets without building/compiling them. As a result, a new command underassets
command may need to be implemented, or an existing one would be updated.Concise Task List
invenio-i18n
'sextract_messages_js
method to extract all JSON files and merge into a single, big JSON file.invenio-i18n
to add a (hypothetical)import_messages_js
method to re-distribute the modified, big JSON file to their respective modules.invenio-cli
'sassets
command to download assets without building/optimizing them to allow translation extraction/injection (if required).invenio-cli
'stranslations
command to allow extraction of Semantic-UI'sJSON
files (merging will be handled byinvenio-i18n
module).invenio-cli
'stranslations
command to reintroduce translatedJSON
files to appropriate packages (division will be handled byinvenio-i18n
module).High Level Flowcharts
The text was updated successfully, but these errors were encountered: