Text to speech, or in short, TTS, is for sure a game-changing technology. It has been used by many businesses and individuals around the world. Speaking of individuals, it is worth mentioning that if it was not for this technology, we could not even communicate with one of the most prominent scientists of our time, Stephen Hawking, let alone the fact that we would be deprived of his scientific achievements.
But what is TTS and how it works?
But what is TTS and how it works?
Generally, text to speech conversion or speech synthesis includes text analysis, linguistic analysis and waveform generation stages. Also, we can consider a text to speech engine having two major parts; a front-end and a back-end. In the back-end a pre-processing happens, in which text and symbols convert to written-out words. Then, in the front-end of engine, the process of assigning phonetic transcriptions to words happens. This process is called text-to-phoneme.
There are two main qualities that a text-to-speech engine performance is measured by; naturalness and intelligibility. In other words, a high-performing TTS engine is one that produces a comprehensible human-like speech. Various engines are struggling to maximize these two qualities. Thus, the new engines are not comparable in naturalness to the machine like systems that most of us have experienced using in the past.
The technology that Artivle is using is called WaveNet generative model, which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%. WaveNet is a deep generative model of raw audio waveforms. These audio waveforms evolve over time and can be trained and create new speech-like waveforms. These waveforms even include realistic breaths and lip smacking.
With all this said, there is still room for improvement and making text to speech conversion a more human-like experience. Though, it is not hard to predict technology would rapidly evolve to make our lives easier using TTS.
What do you think of text to speech technology? Can this technology substitute human reading and voice over in the future? Please let us know in the comments section.
Comments
Post a Comment