Latest NewsTechnology In Focus

Largest text-to-speech AI model yet shows ’emergent abilities’

The largest text-to-speech model ever trained by Amazon researchers is said to display “emergent” characteristics that enhance its natural voice, even in complex sentences. The innovation might be just what technology needs to break out from the uncanny valley.

The researchers especially hoped to see the kind of increase in ability that we saw once language models got past a certain size, but these models were always going to expand and get better. We don’t know why, but after a certain size, LLMs become far more resilient and adaptable and can do things for which they weren’t designed.

That’s not to claim they’re becoming sentient or anything; it’s just that their performance on some conversational AI jobs hockey sticks beyond a certain threshold. The Amazon AGI team believed that this might also occur as text-to-speech models grew, and their research indicates that this is in fact the case. It’s no secret what the team is aiming for.

The new paradigm, which they have twisted into the acronym BASE TTS, is known as Big Adaptive Streamable TTS with Emergent abilities. Ninety percent of the 100,000 hours of public domain speech included in the model’s largest version are in English, with the other portions being in German, Dutch, and Spanish.

BASE-large seems to be the largest model in this category, with 980 million parameters. For comparison, they also trained 400M- and 150M-parameter models using 10,000 and 1,000 hours of audio, respectively. The idea is that if one model exhibits emergent behaviors while the other does not, you will have a range for when those behaviors start to appear.

Source: Techcrunch

Shares:

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *