2 min read

3 keys to scalable video translation

3 keys to scalable video translation

One of the most interesting use cases for automating video production is language translation. Very few organizations can produce software videos, in multiple languages, at scale.

But there's a huge demand for it.

While many software/SaaS companies have effective translation processes for their technical documentation, offering videos in multiple languages remains an expensive and elusive format.

The good news is that new technology is here to help.

Using phonemes in translated text to speech

In any translation process you need to manage terminology such as brand, product names, and technical jargon to ensure that specific words or phrases are properly handled. For text documents, you can build custom vocabularies that tell a translation engine “do not translate” certain strings.

With video, there is an extra layer which is how to pronounce your terminology. Video needs to be aware of phonemes, the letters used to represent the sounds. You probably learned about phonemes when you studied phonics in school.

Our product name is Videate, which we need pronounced in a specific way. Videate is a portmanteau of “Video” and “Ideate.” To pronounce it the correct way, you say the “Vid” in Video and the “eate” in Ideate. The phoneme is ˈvɪdieɪt and it sounds phonetically like “vid-ee-eyt.”

But when you send the word Videate to an Italian or Spanish text to speech (TTS) engine it creates its own pronunciation. There is actually an Italian word Videate which means screenshots (we thought that was pretty cool too, after all a software video is a continuous composite of screenshots, isn’t it?). It usually gets pronounced as “vid-ee-ah-tay.”

We want to keep the same pronunciation regardless of language. So, we have to make sure when we record videos we only use the English version. We do this through a phoneme database, just like special terminology.

Phonemes are supported in Speech Synthesis Markup Language (SSML) which is processed by TTS engines, and for product and brand names this is a must.

Getting acronyms pronounced correctly

Another area to consider with video translation is acronyms. We have a client who uses the acronym “ESXi” as part of their technology. When you pass ESXi to a TTS engine you usually get back something like “e-sek-see” which sounds like “ey sexy.” The Videate engine will make sure it is always pronounced “E” “S” “X” “I” regardless of the target language (if that’s what you want).

We translated one of our recent videos using Google translate. We have a native Spanish speaker on our team who reviewed the translation and was quite surprised to find that the Spanish language translation was great except for two words (and, quite frankly, one could have gone either way).

By the way, we left out the phoneme in this version so you could hear our company name pronounced in Spanish to emphasize our point (by choice).

Impact of continuous delivery on translated software videos

The challenge of recording software videos is the frequency of releases. A one word change can necessitate the need to record dozens or hundreds of videos. Given the additional burden of translation this is very difficult for everyone except the very largest enterprises.

To make video translation affordable and scalable you need three things:

  1. Technology to automate the production of videos.
  2. A technique to ensure correct pronunciation of your brand, product, technical jargon, and acronyms in all languages.
  3. Written scripts that can be translated by translation software.

Videate makes producing videos in multiple languages quick, easy, and sustainable.