The age of the text to speech audio generator is here

Written by Videate | Oct 18, 2020

Our big bet in launching Videate was that text-to-speech engines would improve rapidly. In the short time since we started the company, Amazon, Microsoft and Google have improved the quality of text to speech by several orders of magnitude. Many companies are investing billions of dollars in this technology, and we’re now observing a “Space Race” for text to speech that is moving very fast.

If you read our initial white paper you saw the reference to the Turing Test, a scenario designed by Alan Turing in 1950 that evaluated natural language conversations between humans and computers using text-only responses. If a human evaluator could not tell the difference after five minutes, the computer passed the test.

Seventy years later, the text-only version of the Turing Test is being replaced by text to speech audio generator.

Videate's text to speech audio generator

Videate uses text to speech technology as part of its overall platform. It is one of the fundamental pieces of our patent-pending solution. The text comes from your documents, the scripts which drive great videos.

We can start with your existing product documentation written in DITA, AsciiDoc, Google docs, or Word. You don’t need to write down every detail of how your software works, you just need to follow a consistent format as if you were speaking the words aloud. As Ridley Scott said, “Once you crack the script, everything else follows.”

Using AI and automation to learn about your SaaS product

When you say click on this icon or go to this menu, we know where the icon or menu is in every part of your application. We use your scripts to navigate your software. We synchronize the movement as if you were moving the mouse and speaking the words.

Again, it doesn’t have to be mechanical. You can add context and animation instructions to enrich the experience, and Videate will use natural language processing to make further improvements.

At the same time, we use the text to speech technology to generate the voice. It is synchronized with the movement as we record. What is produced are videos that were done through automation rather than humans.

The benefits of text to speech audio generators are clear

There’s no post-production processing to edit out pauses, stammers, breathing, noise or errors. And since it’s based on your script, you can quickly make changes, fix typos, and generate new videos in minutes. You can easily deliver up to date videos whenever you release software, even with last minute UI changes.

We’re used to hearing Alexa, Siri, and Google Assistant in our daily lives. And yet, when it comes to using computer generated voice in software videos, there is still skepticism that enterprise software users will find it acceptable.

We surveyed a wide range of B2B software end-users and asked this question:

Given the choice of having up to date software videos with computer generated voices or out of date software videos with human voices, which would you prefer?

The preference has clearly shifted to always up to date videos with automated text to speech audio generators.

The ability to create your own personalized brand voice is just around the corner. Amazon announced this in February 2020 and Google has recently launched a similar capability which is now in beta. It’s not yet accessible cost-wise except for larger companies, but it will be soon for everyone. We're in a new age of text to speech.

“To make a great film you need three things – the script, the script and the script.” - Alfred Hitchcock

View full post