Âé¶¹´«Ã½

Technology

AI trained on YouTube and podcasts speaks with ums and ahs

An artificial intelligence that has been trained on YouTube and podcast recordings generates speech from text prompts that sounds remarkably natural

By Alex Wilkins

9 March 2023

Image of digital waveforms

An AI can generate more natural-sounding synthetic speech by including pauses

Shutterstock/PrinceOfLove

Generating speech with different rhythms and pauses makes it sound more human-like, according to an assessment of an artificial intelligence trained on speech taken from YouTube and podcasts.

Most artificial intelligence text-to-speech systems are trained on data sets of acted speech, which can lead to the output sounding stilted and one-dimensional. More natural speech often displays a wide range of rhythms and patterns to convey different meanings and emotions.

Now, at Carnegie Mellon University in Pittsburgh, Pennsylvania, and his colleagues have used almost 900 hours…

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with Âé¶¹´«Ã½ events and special offers.

Sign up

To continue reading, today with our introductory offers

or

Existing subscribers

Sign in to your account
Piano Exit Overlay Banner Mobile Piano Exit Overlay Banner Desktop