Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Improving translation support for JS parts of InvenioRDM #92

Open
5 tasks
hbayindir opened this issue Jan 21, 2025 · 0 comments
Open
5 tasks

[RFC] Improving translation support for JS parts of InvenioRDM #92

hbayindir opened this issue Jan 21, 2025 · 0 comments
Assignees
Labels
Proposal: Pending Proposal for new RFC, pending triage

Comments

@hbayindir
Copy link

This is the design document resulting from the translations improvement efforts made in 20241210 and 20250114 telecons.

Aim of the Document

This document aims to summarize the discussion made on 14 Jan 2025, and put forward a design towards implementation of a preliminary translation/internationalization (i18n from now on) pipeline for InvenioRDM to accelerate deployment.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119 RFC8174 when, and only when, they appear in all capitals, as shown here.

Current Shortcomings

InvenioRDM contains two big pieces forming its front-end. Python (Flask) and JavaScript (React). On top of this two language foundation, there are numerous modules present which make parts of the InvenioRDM functionality. As a result, while InvenioRDM looks and works like a coherent software package, it's composed from many small, sometimes independent modules, making it a complex piece of software.

There are various ways to translate these packages and (user) interfaces to different languages to enable i18n, which is required by quite a few installations and institutions. Currently, there are numerous ways to make this possible, however the methods are time consuming and not practical. In its current state, translation of InvenioRDM boils down to translation of strings in Transifex and injection of these translations to every package, mostly manually, before releasing a new version of the software.

To counter this shortcoming, a new feature called Translation Bundle concept was proposed. In short, this functionality allows a drop-in package to translate the whole InvenioRDM deployment, which can be updated by replacing the package and restarting the services. However, currently this feature supports the Python parts, and JavaScript parts require an extensive work to integrate into this model.

Proposed Solution

To accelerate the development of i18n solutions for InvenioRDM, an intermediate model for translating the foundation and modules of the InvenioRDM is proposed in the remote meetings (telecons for short) on 20241210 and 20250114. This model is what's detailed and discussed in this document.

Idea Overview

Current proposal envisions some changes to the so-called "assembly" phase of an InvenioRDM installation. In short, in consists of two phases:

  1. Collection of all translatable strings from JavaScript packages into a single file.
  2. Distribution of the translations to their relevant packages before installation.

This workflow requires changes to the invenio-cli command to allow collection, distribution and merging of the translation strings. Initial work is done on so-called Pull #380, and further work is expected to be done on the same pull request, as well.

Detailed Description

In the current state of the improvement, it's envisioned that the translatable strings are collected from the codebase with invenio-cli command, where it uses the core implementation which will be present in invenio-i18n command. This process will visit every module and collate the strings to a single JSON file while prefixing them with the package name they're coming from to prevent collisions and easy distribution.

These strings then will be translated independently, and when the process is done, the same JSON file will be used with the invenio-i18n or invenio-cli tool to redistribute the strings back to the packages which they belong.

After this replacement/injection has been done, the assembly of the InvenioRDM can continue and the software can be installed.

Workflow Considerations

It's evident that the collection of strings and re-introduction of them should be done asynchronously. Hence, the workflow should be conductive to extraction and re-introduction of the strings without executing a whole installation workflow.

It's considered that the introduction of the strings will be done and virtual environment level, and re-building such environment will remove the changes made to the strings and installation source in general. It's assumed that the tools around the InvenioRDM installation is conductive to such workflows in general, so the necessary changes will be minimal in nature.

It's envisioned that the changes will be introduced before the so-called final-assembly of the modules and packages to form the final InvenioRDM installation artifact. Namely, before webpack step.

Every package implementing semantic-ui already has translation files located in semantic-ui/translations/*/messages/*/translations.json, so the tool will target these files for collection and re-distribution.

Implementation Details

The current implementation proposal has two main parts: invenio-i18n module, and invenio-cli tool. The most of the logic should be implemented in the former to prevent a hard-requirement in invenio-cli tool, yet the tool itself should provide the necessary interface to automate/ease the process as much as possible.

invenio-i18n

invenio-i18n is a Python library mostly concerned with translation of InvenioRDM as a whole. The library has a documentation page which details its capabilities. From the documentation:

This module provide features for loading and merging message catalogs. It is built on top of Flask-BabelEx and most external modules should just depend on Flask-BabelEx instead of Invenio-I18N. Only applications in need of loading and merging many message catalogs should integrate this module.

However, the library/tool also can traverse React libraries, and extract react-i18next translation catalogs (JSON files).

The first step in invenio-i18n would be adding the ability to extract trasnlation catalogs present in semantic-ui implementations, which are stored as JSON files. To do so, extract_messages_js flow inside invenio-i18n should be implemented.

Next step is to add functionality in invenio-i8n package to read this big JSON file, divide into required JSON files, and replace the catalogs inside every module. This is where keying every string with its respective package name is required. This can be implemented as a new function with name import_messages_js, or similar.

However, adding functions solely to invenio-i18n is not enough, since the functionality should be available during the testing and deployment process. Hence, this requirement brings us to invenio-cli.

invenio-cli

invenio-cli is the main tool for the InvenioRDM software stack. It enables users to manage their InvenioRDM installation from start to finish, including translations. As a result, invenio-cli has a translations command which handles all the steps required for translations from extraction to updating dictionaries used by packages.

As a result, invenio-cli translations command needs to be updated to interface with the new and updated functionality inside invenio-i18n module.

The tool also manages the asset management function, under assets command. Currently looks like it's missing an option which only downloads the assets without building/compiling them. As a result, a new command under assets command may need to be implemented, or an existing one would be updated.

Concise Task List

  • Improve invenio-i18n's extract_messages_js method to extract all JSON files and merge into a single, big JSON file.
  • Improve invenio-i18n to add a (hypothetical) import_messages_js method to re-distribute the modified, big JSON file to their respective modules.
  • Improve invenio-cli's assets command to download assets without building/optimizing them to allow translation extraction/injection (if required).
  • Improve invenio-cli's translations command to allow extraction of Semantic-UI's JSON files (merging will be handled by invenio-i18n module).
  • Improve invenio-cli's translations command to reintroduce translated JSON files to appropriate packages (division will be handled by invenio-i18n module).

High Level Flowcharts

---
title: Idealized JS Translations Extraction & Merging
---
flowchart TD

get_invenio_cli["Get invenio-cli"]
download_assets["Download Assets"]
extract_js_translations["Extract JavaScript Translations"]
get_json_files["Get multiple JSON files"]
merge_json_files["Merge JSON files"]
get_single_json["Obtain merged JSON file"]

get_single_json2["Obtain merged JSON file"]
create_single_json["Create and write per package JSON"]
build_assets["Compile / Build assets"]
continue_install["Continue installation as usual"]

get_json_files@{ shape: docs}
get_single_json@{ shape: doc}
get_single_json2@{ shape: doc}

get_invenio_cli --> download_assets --> extract_js_translations --> get_json_files --> merge_json_files --> get_single_json

get_single_json2 --> create_single_json --> build_assets --> continue_install
Loading
---
title: Extracing & Merging JSON files
---
flowchart TD

get_package_list["Get list of packages"]
get_package_name["Get package name"]
get_json_per_package["Get JSON file from package"]
extract_strings["Extract strings from package JSON"]
add_package_key["Append package name as namespace key"]
write_to_merged_file["Append new strings to single JSON"]
check_remaining_packages["More Packages?"]
write_complete_file["Write completed JSON to disk"]

write_complete_file@{ shape: doc}
check_remaining_packages@{ shape: decision}

get_package_list --> get_package_name --> get_json_per_package --> extract_strings --> add_package_key --> write_to_merged_file --> check_remaining_packages --  No --> write_complete_file

check_remaining_packages -- Yes --> get_package_name

Loading
---
title: Merging JSON files to packages
---
flowchart TD

get_merged_json_file["Get merged JSON file"]
read_json_file["Read JSON file to memory"]
get_list_of_packages["Get a list of packages"]
get_package_name["Get package name"]
create_namespace_key["Create namespace key from package name"]
find_strings["Find strings matching the namespace key"]
write_package_json["Create and write a JSON file for the package"]
any_more[Any packages left?]
finish["Exit"]

get_merged_json_file@{ shape: doc }
write_package_json@{ shape: doc }
any_more@{ shape: decision }

get_merged_json_file --> read_json_file --> get_list_of_packages --> get_package_name --> create_namespace_key --> find_strings --> write_package_json --> any_more -- No --> finish

any_more -- Yes --> get_package_name
Loading
@hbayindir hbayindir added the Proposal: Pending Proposal for new RFC, pending triage label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal: Pending Proposal for new RFC, pending triage
Projects
None yet
Development

No branches or pull requests

5 participants