Localization often takes a back seat in the content creation process –
the last step on the road to delivering solid, consistent, high-quality global content.
Yet, content developers need to keep localization in mind right from the get-go.
For more than 20 years, I worked in a corporate translation/localization department. During this time, I strived to bring messages “upstream” to the providers of the content that we translated: UI developers and documentation writers. In fact, I communicated with the content providers regularly and, in many cases, successfully. The difficult part was that with almost every new development team or documentation team, the same messages had to be repeated.
In 2019, I moved “upstream” and joined the technical communications team at Dolby Laboratories as a publishing engineer and localization specialist. I gladly noted that some of the good practices of preparing content for localization were already in place at Dolby, and today, I am contributing to developing them further. The to-do list looks pretty familiar – not only from my previous experience, but also from what I have gathered through communicating with my translation colleagues who work as freelance translators, language service providers, or localization specialists at other companies (e.g., in software). This observation implies that the gap between content providers and localization providers is something that can be defined and filled, provided that localization is taken into consideration a little earlier in the content creation process and not just at the very end. Let me share with you some of the building blocks to build this bridge between technical communication and localization.
1. Documentation formats are translatable
This might sound trivial to some, but not to others: In 2021 you hardly ever need to send an output PDF document or fully rendered HTML to translation. Yes, it might be helpful for translators to get these as reference materials (to have the maximum context for translation), but for their actual work, they prefer the format you write in!
DITA, MadCap, AuthorIT, and Markdown are all suitable formats for sending directly to translation and can be handled by computer-aided translation (CAT) tools. After all, CAT tools are the environments where the majority of technical translation is performed. They allow translators to reuse existing translations (either previous versions of content or text repetitions), apply predefined terminology, and maintain the format and tagging of content.
Why is it important to provide translators with the format you write in? Because when translators have to deal with output like PDFs or fully rendered HTML, they actually perform a kind of reverse engineering and generate an intermediate format which CAT tools can work with. This means not only extra effort and additional cost, but also a potential area for errors. Another error-prone process is turning translated content back into PDF or fully rendered HTML.
When translators receive your writing format instead, the CAT tool preserves it and you get translated text back in DITA, MadCap, AuthorIT, or Markdown, which you can then further process to obtain the actual output of the desired languages.
If your writing format is InDesign, Microsoft Word, raw HTML, or some XML variation, it will also be consumable for CAT tools “as it is”. If your format is one of the “custom” or “seasoned” ones, I recommend consulting with your translators (or your LSP) if they can handle this format or what their preferred alternative interchange format would be. I bet that XLIFF will be the answer much more often than PDF.
As you can see in Figure 1, all content from the DITA file – including index terms, etc. – is available for translation in a CAT environment. What translators don’t see in CAT is the placement of each topic in the context of the entire document, illustrations, etc. Therefore, a sample output (PDF, HTML, or other) greatly improves translation quality and reduces the need to answer questions from translators.
2. Proper content fragmentation: keyref, conref, snippet, etc.
Content reuse and automatic insertion of repeatable phrases are good for writers. However, they are not so good for translators. I would not go as far as to state that the interests of writers and translators conflict, but fragmentation of any translatable text should be handled with care. It is perfectly fine from a translation perspective – and cost-effective if we consider how CAT tools work – to reuse entire chapters or paragraphs of text. But things get difficult once the repetition drops below the full sentence level. A single translatable word inserted as a conref or snippet may make the entire sentence difficult or impossible to translate because the following things can happen in various languages:
- Depending on one word, the entire sentence may need to be changed (e.g., to another gender).
- The word may need to be modified to another form depending on the sentence you place it in.
Let’s have a look at the example in Figure 2a. Both keyrefs containing terms and uicontrol tags that are responsible for highlighting user interface items have been properly identified by the CAT tool as markup and will thus be protected from changes in translation. This is good for uicontrols; however, at least some of the keyrefs represent text that needs to be translated. Depending on the language and position in the sentences, the same text may need a different translation. In this case, “g_hard_dd” stands for “hard disk drive”, and if this topic is translated into Czech, Polish, Russian, or any other inflected language, the nominative form (“dysk twardy”) is not applicable here; you need a genitive (“dysku twardego”). In many other sentences, you may need the same term in the nominative or another case.
One possible solution is normalizing the content before sending it to translation – that is, resolving all keyrefs (or their equivalents in the content format you’re using) to plain text.
In the example in Figure 2b, the keyrefs were resolved to plain text because at least some of them will need translation. The uicontrols were left because they should not be changed in translation.
3. Translatable image text
Plenty of documentation contains graphics with text: diagrams, schemes, images, or screen captures. Most of this text needs to be translated when the content is translated. There are good practices for how to make graphics more time- and cost-effective in translation.
he basic advice is similar to the previous recommendation: reduce text fragmentation. It may be tempting for a visual designer to cut phrases into single words for a better alignment within the image (and some techniques go down to single characters). Don’t do that if you want to avoid having to put the text back manually (or semi-manually) into meaningful phrases to get them properly translated.
My second important piece of advice is to place text on a separate layer from graphics, especially when annotating a picture. It’s generally helpful for any future text maintenance or reuse, and the ease of replacing original text with translation is just one of the benefits.
And last but not least: if possible, select a graphics format that is friendly to CAT tools. Just as with writing formats, CAT tools can support many visual formats without manual copy-paste. If your graphics are not one of these formats, then discuss the interchange format with your translators.
As you can see in the example in Figures 3a and 3b, translators will only see the text in their CAT environment. It is only after translation that you obtain an image in the target language. Please keep in mind that providing the actual images (and/or output with images) as additional context is necessary for proper translation, even though translators do not do the graphics/DTP work.
A special type of graphics is screen captures. Before sending a document with screen captures (or any references to the user interface) to translation, you need to check a few facts:
- Is the user interface also localized?
- If yes, can you provide the localized screen captures for all languages to the translators, as reference material for the UI they are describing?
- If not, should the localized documentation mention UI strings in their original form (e.g., English), or should translators provide “orientation” translations along with original UI strings?
Delivering this information to translators up front saves a lot of effort (and frustration) once the translation has been completed.
4. Define which text you want translated
There may be large pieces of text that you don’t want translated, for instance, a legal notice for U.S. Government users, or small, repetitive phrases such as trademarks. A good practice is to notify translators of all your “DNT” (Do Not Translate or Do Not Touch) when sending them content for translation. The best practice is to mark the DNTs directly in the content, in case the instruction is lost or translators do not read it thoroughly. An alternative is a good glossary, which I will describe in the next chapter. Let me give you a few more examples of what can be on the DNT list (if in doubt consult your legal or marketing team):
- Product and brand names, if company policy is to use them globally with no alteration
- Programming code samples (Note: comments are still candidates for translation)
- True names, addresses, and geographical locations (Note: fake or sample names/addresses often should be localized)
One last piece of advice: Unless your entire content is legal, you should treat a legal text – copyright, warranty, licensing, etc. – as a special kind of translation. It is often handled by different translators (or even different translation agencies) than your regular content and uses different terminology and style. Inserting a legal fragment into
your documentation may be perfectly useful from your users’ perspective; however, mark it as “Legal – handle with care” when sending your content to translation.
5. Share glossaries with translators
Most technical documentation uses some kind of glossary: either published along with the content (for users to look up technical terms), or internal (for technical writers to use terminology consistently). No matter what type of glossary – if you have it, share it with translators! This is the easiest way to get the terminology translated correctly and consistently, because by providing a glossary, you let translators know: “These are my terms. They are important. This is their meaning”. When you provide a glossary up front, translators can create their localized glossaries and get an instant insight into your domain. After that, translators can import terms and their translations into their CAT tool and use them consistently when working on your content, without always checking “Have I seen this term before?” However, this will only work if you also adhere to your own glossary. Last but not least, a glossary is the easiest way to
pass the information about “Do-Not-Translate”-phrases!
The example in Figure 4 from another CAT tool – Smartling – shows that the DNT term “Dolby Atmos for Headphones” is tagged in the glossary with the information that it must not be translated.
6. Machine Translation
Many technical writers don’t even want to think about the fact that their content might be machine-translated. Many translators don’t even want to touch MT post-editing, as the translation industry calls the human correction of machine output. Let’s face the facts though: Machine translation can be available as an additional aid to professional translators working on your content, and your content can be passed through raw MT with no human touch.
The most important decision here is evaluating the risk for and impact on end users. Today, mission-critical documentation like aircraft maintenance manuals or medical equipment instructions should not be delivered as raw MT. However, some low-risk, high-volume documentation is already published as raw MT and many customers perceive it as more useful than not having any translation at all. Cases in point are bulk software documentation (Microsoft help pages) or online stores (AliExpress).
The second important decision is choosing the right machine translation provider. Google Translate has the broadest coverage of supported language pairs but is not necessarily the best MT for all of them. When dealing with English to German or English to Polish, DeepL Translate is usually a better choice. For English to Russian, Yandex Translate will do the best job. For any content referring to the public sector in Europe, an option to explore is eTranslate. A good source for this information may be the most recent Intento report plus a little MT market research.
So, if we accept that our content may be machine-translated, how do we write for machines? Apply the same rules as you do for human translation (file format, limited fragmentation, image handling, DNT, and glossary), but also aim at text simplicity and unambiguity – because, unlike professional translators, a machine will never ask questions!
There is nothing wrong with perceiving localization as the very last step of content creation, provided that localization is kept in mind during previous steps. The building blocks I proposed will help you reduce translation cost, effort, and delay caused by last-minute issues. The suggested approach lets you play in the same team as your localization provider, and ensures that no translator will stick pins into a doll that impersonates you.
After publishing the general advice in tcworld magazine, I have had a localization project on my plate, which taught me a few more lessons about localizing content for the first time, i.e. preparing for translation a documentation, none part of which had been translated before.
- Whatever quality checks you, or your organization, applies to documentation before publishing it – please apply them before localization. For example: ensure that no links are broken, either internal (to another chapter) or external (to a webpage). If you fix them once in the soon-to-be-translated document, it will save you fixing N times in the N language versions.
- Remember what I wrote about images with editable text? If you are into vector graphics anyway and can generate your images as SVG, reference those SVGs in the content and forget about JPG or PNG. This will save you, and your translators, a lot of time defining how localized images (PNG or JPG) should be produced from graphical tool after translation. If SVGs are not an option, then you need to provide those instructions (or obtain them from your graphics master), so that localized images have the same size, resolution and other parameters as your original images – and again, do it up-front to save time fixing N language versions after they are translated and produced in wrong format.
- What tricks have you used to make your documentation look perfect? Empty paragraphs in DITA, hardcoded page breaks in Word, or whatever else – to have chapter X, table Y and figure Z on the same page? Well… before localization, the best you can do is to remove all these tricks from your content, produce output and check if it looks acceptable. Why? Because every language has slightly different length of text. The difference can be as much as 30% and will not be equally distributed among chapters and paragraphs. So, after translation, the empty paragraphs or page breaks will occur in random and/or awkward places in every language except the one you originally wrote in. For a clear and tidy look of all language versions, the general publishing settings should do – e.g. starting each major chapter on a new page.
- And last but not least: Are the fonts used to publish your content (to PDF or HTML) Unicode fonts? Do they support all languages and scripts, or only Latin-based, perhaps? It’s 2021 but the answer may surprise you 🙂 The font which looks gorgeous for English and German, may not support Cyrillic scripts (e.g. Russian), bidirectional languages (e.g. Arabic or Hebrew), or double-byte charaters sets (e.g. Chinese or Korean). If publishing in other languages is sporadic, and “overseas markets” income is close to the thickness of line on the sales chart of your organization, then a practical solution is selecting an Unicode-enabled “replacement font” for your localized content. However, if globalization is a thing where you document, then a long-term solution is to select a font with Unicode coverage and use it for both source and translated documents, and to live happily ever after.