This browser does not support the video element.
Posted On: Aug 29, 2023 Written by: Abe Udy
AI voiceovers. Synthetic voices. Text to speech. Whatever you call it, there’s no denying the rapid advancements in AI voice technology over the last 6 months. What was thought completely impossible just a few years ago is now possible.
Voiceover talent can clone their voices with just an hour of audio (and much less in some cases). Human-sounding voices can be generated solely by an algorithm. Creators can add a voice to their project without a professional voiceover talent needing to step foot in a recording booth.
But just because you CAN do it, does it mean you SHOULD?
Let’s discuss the pros and cons of Human voices vs AI voices.
Emotion & engagement
A professional human voiceover talent can breathe life, colour, emotion, cadence, light & shade into their delivery. With just a few words to direct them - either in person or via a written brief - a human voice will bring a script to life, ensuring it engages and connects with the audience.
Speed & turnaround
You might assume that having a human voiceover recorded and edited takes days. But that’s not true. With the right systems and people in place to manage the voiceover talent, a script can be sent, recorded, edited and delivered by real humans in a matter of hours (and sometimes, minutes!)
I’ve spoken to several eLearning, video production and animation professionals, many of whom have investigated using AI voices. Once the personnel costs and time were factored in, they calculated that the cost of using human voice talent was often only 20-30% more than using AI voices - not a 200-300% difference as they had expected!
When you engage a human voiceover for your project, money flows to that talent, supporting them, their family and their local community.
Instant turnaround is not possible
Script changes on the fly will likely take more than a few minutes to have revoiced, as the talent needs to physically record the session before the audio is edited and delivered.
Humans get sick and take holidays, so there’s the chance that a specific human voice talent might not be readily available to record your script.
Realtime content creation
The ability to create voiceover content in real-time is a definite pro. If you’re looking to create a large volume of hyper-targeted, personalised ads, you might want to consider an AI voice.
From individualised fitness programmes to a campaign of thousands of digital ads where content is created in real-time uniquely for each user, AI voice technology is a good option.
Make changes instantly
Some internal training or eLearning content can also be successfully created with AI voices, particularly if there are multiple stakeholders involved in the approval process and the content is constantly changing. The ability to make changes on the fly might be attractive for these types of projects.
For some projects, the cost of using an AI voice over human talent could be beneficial. If you’re using a platform that has modelled human voices, those voice actors might receive a royalty. However, with some platforms, royalty payments equate to around just $20 per generated hour of content. (The talent won’t be getting rich while sitting on a beach!)
Lack of authenticity
Despite what is claimed by many AI voice platforms, human authenticity and emotion are very difficult (if not impossible) to achieve. Sure, AI-generated voices may have a perfect cadence, but a perfect delivery can sound anything but because humans don’t naturally speak with perfection.
When you listen to any AI voice after a few seconds, you can tell that something is not quite right about it - there’s a lack of authenticity, emotion, and engagement.
Despite one platform claiming “Truly human emotions in every voiceover generated, breathing life into your voiceovers. 95% of people can't tell this is text to speech”, blind testing we’ve done shows this was not the case.
People COULD tell the difference.
The technology is impressive, but it’s the last 15 per cent - the part requiring real authentic emotion and connection - that will be the hardest to get right. And when an AI voice is not quite right - it’s missed the mark wildly.
The English language is difficult - even for humans! Words with the same spelling but differing pronunciations and words that look nothing like they sound - it can be difficult to tell an AI app how certain words should be pronounced.
Phrasing and flow
While an AI voice can be generated almost instantly, ensuring the phrasing and flow are correct can take much longer. We’ve heard stories of eLearning creators who’ve used synthetic voices, only to have their team spend hours regenerating specific parts and having to get into detailed audio editing to get the phrases to sound natural and flow well.
What can look like a quick solution at the outset often means much more time spent on the back end.
Your money doesn't stay local
International corporations own most AI voice platforms, so your money goes offshore.
Abe’s Audio has been recording and producing voiceover content for 25 years. From a one-person business in a tiny bedroom studio to a team of 22 staff, we’ve produced over 1.5 million commercial, narration and eLearning scripts. (That’s 320+ million words!)
As voiceover recording specialists, we can see the merits of using AI voices for specific projects.
However, in most cases, AI-generated voices don’t live up to the hype.
Sure, the technology is impressive and there are use cases, but if you’re looking for authenticity in a voiceover delivery & engagement with your audience, human voices are still the best option.
Because with great systems to manage top talent, price and turnaround time - the assumed strengths of AI voices - are less of a strength than some might think.
Got a challenging voiceover project? Talk to our team today, and we’ll work with you to find a solution.