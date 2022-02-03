Microsoft researchers have unveiled VALL-E, an artificial intelligence that can synthesize the voice of any person after hearing it for just three seconds. The result is as impressive as it is alarming.

By synthesizing a person’s voice after listening to it, Microsoft’s AI can “speak for them” while preserving the tone, emotion and soundscape of the person. However, the creators of “VALL-E” are being cautious.

No more talking, the AI will do it for you

Microsoft describes VALL-E as a “neural codec language model”. AI is a speech synthesis model capable of generating speech. This is nothing new, but the VALL-E stands out for its learning speed of just three seconds and its ability to replicate the emotions of a speaking person. Another distinctive feature of artificial intelligence is that it creates a record of words and phrases that the speaker has never spoken.

To do this, artificial intelligence was trained on more than 60,000 hours of English speech delivered by more than 7,000 speakers reading free audiobooks that are publicly available on LibriVox.

The snippets that Microsoft has shared on Github are divided into four columns. The first one, “Speaker Prompt”, is a 3-second sound meant to mimic VALL-E. The second, “Ground Truth”, is a pre-existing recording of the same speaker for comparison. The third, “Baseline”, is an example of conventional speech synthesis. Finally, “VALL-E” is a passage spoken by Microsoft’s artificial intelligence.

You can hear quite varied results. Some of them sound very similar to a human voice, while others are clearly dictated by a robot. Obviously this is just the beginning as the AI tends to get better over time. Also remember that the initial samples are only three seconds long. It can be assumed that with the increase in the amount of data, VALL-E will be able to obtain even more convincing results.

Realizing the potential difficulties associated with using VALL-E in the wrong hands, Microsoft did not share its AI code. Therefore, at the moment it is impossible to independently test artificial intelligence.

Microsoft concludes its presentation with the following words: