I18n Proposal A
This proposal focuses on i18n for strings javascript code and strings in HTML files. It may be better described as a module rather than a standard.
Contents
Rationale
Basic internationalization is part of the standard library/platform. Good examples are Python and the GNU platform. Internationalization is frequently an afterthought for software developers so having a well-defined and simple API can ensure that applications can be internationalized without major refactoring.
Philosophy
The i18n mechanism should work in a client/server application and completely offline without any server-side emulation. This is distinctly different than how i18n works in web frameworks such as ruby-on-rails and django. Most web frameworks use server-side html templates that replace translatable strings before sending the page to the client. This scheme does not work for offline applications. It also relies on html markup that is either invalid or unrenderable prior to pre-processing. It also makes the application very dependent a on a specific server-side handler and particular web server configuration.
This document proposes an i18n mechanism without server-side dependencies and using valid html5 markup. It also proposes a mechanism that is as simple as possible and can be applied incrementally to an existing application.
Applicability
This specification is applicable to:
- Marking strings in javascript code for translation
- Marking strings in html for translation
It is not applicable at this time to:
- Common Locale Data Repository (CLDR) as exemplified in POSIX locale
- internationalizing CSS attributes such as fonts and images
Basic Mechanisms
Method for Marking Strings in javascript Code
1. The Standard GNU Gettext method _("A translatable string");
Example: document.write(_("Translate me!"));
2. The same method within E4X
Example: var navigationBar = <nav><button>{_("Go Back")}</button><button>{_("Reset")}</button></nav> ;
Method for Marking strings for Translation in HTML
Use the data-* collection of author-defined attributes in html5
Example 1.
<p data-_="true">To say hello in Spanish, say <span data-_="false">Hola</span></p>
Example 2.
<button data-_="true" data-_context="research">Compile Articles</button><!-- Somewhere else in same application --><button data-_="true" data-_context="computers">Compile Code</button>
Marking strings means that they will fetched from the translation .po files at run time and that collection script xgettext can be used to gather string for translation.
Storing and Retrieving Translations
The translations will be stored .po files. PO (Portable Files) are well supported by online translation tools such as Pootle.
xgettext is the standard tool for grabbing translatable strings from an application. CommonJS requires a js implementation of this tool.
Implementation
Here are the methods, attributes, global variable, and helper scripts I would like to see. It is primarily using Gettext.js
Methods
Pretty much entirely copied from jsgettext
- new Gettext (args)textdomain( domain )
- gettext( MSGID )
- dgettext( TEXTDOMAIN, MSGID )
- dcgettext( TEXTDOMAIN, MSGID, CATEGORY )
- ngettext( MSGID, MSGID_PLURAL, COUNT )
- dngettext( TEXTDOMAIN, MSGID, MSGID_PLURAL, COUNT )
- dcngettext( TEXTDOMAIN, MSGID, MSGID_PLURAL, COUNT, CATEGORY )
- pgettext( MSGCTXT, MSGID )
- dpgettext( TEXTDOMAIN, MSGCTXT, MSGID )
- dcpgettext( TEXTDOMAIN, MSGCTXT, MSGID, CATEGORY )
- npgettext( MSGCTXT, MSGID, MSGID_PLURAL, COUNT )
- dnpgettext( TEXTDOMAIN, MSGCTXT, MSGID, MSGID_PLURAL, COUNT )
- dcnpgettext( TEXTDOMAIN, MSGCTXT, MSGID, MSGID_PLURAL, COUNT, CATEGORY )
- strargs (string, argument_array)
strargs in particular will have to be modified to handle native numerals like १ २ ३ ४ ५ (1, 2, 3, 4, 5 in Nepali). Notably Arabic and Hindi use the standard numeric system (1-10) but different characters to represent the numbers
HTML5 Attributes
For specifying a String should be translated
- data-translate="true|false" text for element should be translated
- data-_="true|false" short form of the above
- data-_C="true|false" grab all text from this element and all its children
- data-_I="true|false" grab text AND all inline markup, then translators can decide whether <i> or <strong> are semantically meaningful in their language. This grabs all innerHTML and leaves it to the translators to decide
- data-_comments="help explain meaning of text to be translated"
example: <button data-_="true" data-_comments="File is used as verb">File</button>
- data-_ctxt -- Context -- to differentiate between different usage of the same word w/in the same document, particularly when those meanings do not have a synonym in a different language
example: <button data-_="true" data-_context="research">Compile</button> <button data-_="true" data-_context="computers">Compile</button>
Helper Functions
xgettext - w/ essentially the same options and function as gnu gettext, but w/ at least one new switch --report to
indicate what percent of the application has been translated into other locales
Environment Variables
ALL_LINGUAS = "de en fr"
this variable indicates what locales the application has translations for.
Relevant Files and directories
po/ for po files that contain translations
POTFILES.in files that xgettext should grab strings from POTFILES.ignore files that xgettext should ignore application_name.pot locale_name.po translation of the application for a given locale
Test Cases
Sample text (Everything should be translated)
Hello world
Sample text with placeholder for dynamic data (The generated POT file should have the tags as well)
Score : 0
Sample text with context (Everything should be translated, additionally the generated POT file should have two instances of "Compile Articles", with different msgctxt)
Compile Articles Compile Articles
Date and time (this should appear in local format, eg: শুক্র সেপ্টেম্বর 18 15:10:58 IST 2009)
Numbered list (The numerals should be in native digits)
- Hello world1
- Hello world2
- Hello world3
Right to left text - Direction and justification (left|right) should be preserved
خطأ في مُغيّرات الألوان المحددة.