AI for OpenMRS String Translations: Proposing guidelines for using AI for translation 🌐

We reached a major translation milestone for O3 this month.

Recently we reached a major milestone in Transifex: 100% of code strings in OpenMRS 3 translated from English into French! :fr: :baguette_bread: :tada:

This was the result of quite a sprint from Aug → Oct, and I’m super thankful to Paul Henri Assoa and @michaelbontyes who were critical as a Francais peer review crew. Merci beaucoup!

Along the way, I discovered 3 key things:

  1. People are already using auto-translate tools, AI or not. One stakeholder told us to “just use DeepL”; one sent us a spreadsheet with translations obviously copy-pasted from Google Translate without proper app context, and in some cases in our Transifex itself it looked like old / previous french translations from years ago had been themselves copy-pasted from Google Translate, with sometimes weird / not-quite-best semantics. (I fixed or flagged for review all those I came across.)

  2. Transifex now offers AI-based translation. We used up our ~10 free clicks of this button, but you can see how the Transifex product team have added their own AI feature into the mix: You click this button and it basically auto-generates a proposed translation for you to review, edit if needed, and save. Saves a lot of time especially if you’re dealing with a lot of non-keyboard-characters that you don’t feel like typing the shortcuts for :sweat_smile: .

  3. DeepL and ChatGPT were indeed helpful to me in validating best phrasing - but with careful context and peer review. I personally triangulated my ideas or concerns by using both ChatGPT and DeepL to help with some translation. In many cases I didn’t feel something was quite right for a given context (e.g. one tricky context was error messages for the Patient Queues app lists) and these tools helped me experiment with a few ideas, validate grammar, and triple-check the idea on multiple platforms before asking for peer-review from Paul Henri or Michael. (To be clear I don’t recommend someone do this with a language they can’t read; like I would never personally do this with Singhalese or Afrikaans for example, because I’d have no clue if the structure or tone was even remotely correct.)

    • Great Example of Context-Gone-Wrong: we were given a large file of suggested translations, including a request to translate “Invalid Ward” to what would basically read in French as “Hospital Ward for Disabled People”, but the context was an error message - so the translation should have been along the lines of “The ward you selected is not valid!” I suspect this came from a bulk copy-paste into something like DeepL or Google Translate.

So here are some proposed guidelines for translation contributors, for using AI Tools for translation:

  1. Do not blindly trust the tool - only use automated translation tools if you have a basic understanding of the language so can provide some human review.

  2. Do not use for bulk translation where there is more than one subject area at play. LLMs are more likely to make mistakes if you jump between contexts (e.g., if you jump from translating strings about inpatient wards to text about waiting queue error messages, you have a greater likelihood of model confusion).

  3. Only Copy-Paste if you are confident in the translation.

  4. Confirm with more than one tool. Personally I found that comparing ChatGPT with DeepL suggestions helped me narrow down a more clear translation, much much more than just using one alone (plus my basic knowledge). And, of course, always validate with human translators if you are unsure.

What do folks think of these guidelines? Anything you’d tweak or suggest?

4 Likes

p.s. - one more advantage I forgot to mention was how using the LLM tools (DeepL and ChatGPT in my case), was catching mistakes in gender-ized language. Sometimes I wasn’t sure the best way to rephrase something to be gender-neutral; and sometimes I wasn’t sure which gender to apply for a given word/phrase (e.g. le vs la in French).

More on our guidance around translating grammatical gender here.

1 Like

p.s.s. - in the future, mayyyybe we could violate my “Guideline #1” (“Do not blindly trust the tool if you don’t have some familiarity with the language”) if LLMs prove to have very accurate semantic translations if you can give them multiple translation examples at once (e.g. "I don’t know Arabic; but here you go LLM, here is the same string in English and French and Spanish and…). That might help with ensuring the right semantics are used. I was surprised by how often, even between just English and French, the wording had to be pretty different (not literal) in order to capture the same intended meaning.

1 Like

Thanks for sharing @grace. I would add a similar experience in exploring the ChatGPT API to fill in some Spanish translations for CIEL. It wasn’t hard to find clear examples – e.g., medical abbreviations and names of bones – where ChatGPT was making literal translations instead of using appropriate terms, even when I tried to give it some context (purpose of the translation for medical use; favor common medical terminology over literal translation of words or abbreviations; etc.).

I’m very much looking forward to the day when AI can not only reliably translate medical terminology, but also put a real-time interpreter in your ear for cross-language conversations. We can see it coming… but we’re not yet there. :slight_smile: