Jump to content

Just Artificial, Not Intelligence

Recommended Posts


Image – Getty iStockphoto: Phonlamai Photo

Scaring Up Some Audiobooks

Recently, a distributor of digital books – both ebooks and audiobooks – announced that it was adding a new offering for publishers: “AI voicing” for audiobooks. The company could barely clear the word audiobooks before rushing to assure everyone that “AI narration” would never match the primacy of audio work using human readers and production. The caveat, going on and on, came across almost as an apology before any offense had been committed.

True, a certain resistance to the idea of machine-generated audiobooks is hardly eased by such headlines as Synthetic Voices Want To Take Over Audiobooks (Wired, January 27). No, they don’t. Synthetic voices don’t want to take over audiobooks. They don’t want anything. They’re synthetic. But book publishing is an industry that’s never accepted digital developments easily. Even after e-commerce and digital products played a key role in the US market’s comparative success during the still ongoing pandemic, those “synthetic voices” seem to murmur something sinister.

The many vendors now offering machine-generated audio narration know that this is the pushback to expect. It’s a mindfield of emotional reaction. They’re nervous about it.

Some defensiveness isn’t without reason. The business of gifted human narrators – who are actually readers, voice actors, interpreters, not narrators, the term has never been quite right – are supported by many additional workers in important roles. Those workers include sound technicians, audio editors, studio and tracking-booth providers, producers, in some cases directors, and more folks. Jobs are involved, and they comprise a lot of talent and many skill sets. Programs like the Audio Publishers Association support these workers, and the APA’s Audie Awards rightly honor their work in 25 categories.

Nevertheless, there are compelling reasons for publishers to listen to machine-generated audiobook readings. The kind of work they can handle is unlikely ever to be produced in human readings because of the cost factor.

As many publishing professionals readily agree, machine-produced voicings may be best for nonfiction, which is generally thought not to need the emotional and aesthetic nuance of fiction. But of course, in a great many cases, nonfiction is read by the human author, who may be untrained and inexperienced at the microphone. While there’s always someone asserting that those “synthetic voices” feature many mistakes in pronunciation, so does the work of many human authors.

Just days ago, I heard a very fine, prominent nonfiction author in his reading of his own book pronounce scathing with a short a, making that first syllable rhyme with cat. Most of us have had the experience of discovering, red-faced, that we’ve been pronouncing something wrongly for years. The audio edition of one of last year’s most important American political books was at times almost comical in its mispronunciations by its much-praised author. In both machine-generated and human-produced readings, proof-listening is critical to catch these things.

Still, the imperative for publishers regarding audio actually goes beyond nonfiction.

Listening Out for the Backlist

There are many cases in which no audio edition of a book has been made or will be made because of the expense of standard human labor-intensive production. Consider a publisher with a large backlist of important titles never given audio treatments. Is the author helped by the fact that no audiobook edition of her or his book is available? Of course not. There are customers who want audio. Some of them consume books only in audio renditions. Is that money to be left on the table simply because a human-produced rendition can’t be afforded?


Provocations graphic by Liam Walsh

What’s more, machine-generated voices have improved dramatically in the last two to four years. Have you heard Amazon’s male Alexa voice? While we weren’t listening, the quality of those “synthetic voices” has been making progress.

The vendor called Speechki now offers 364 synthetic voices (lots of accents and dialects) in 77 languages. On its home page, listen briefly to the short demo under the header “Your Audiobook Could Sound Like This!” What do you think? Try your ear in the 10-file quiz in which the company challenges you, “Bet you can’t tell a robot from a human!” No self-respecting robot would use as many gratuitous exclamation points as Speechki and many other excited vendors do, but you may be surprised how you score on that quiz.

The cost factor of standard audiobook production is daunting, especially if you have a big catalogue of good backlist that needs audio renditions. While a well-made audiobook with a standard human reading can cost thousands of dollars, the digitally produced edition can come in at several hundred bucks. It also takes less than a day to produce an audiobook when the talent is a distant cousin of the elevator that tells you, “Doors opening.”

But where so much of the discussion goes subtly awry is in overheated connotations of the term artificial intelligence. The commercial sector’s fondness for that term, all robot-y and Ex Machina sexy is wrongly applied here, just as it’s being wrongly applied in so many parts of industry and entertainment.

The “synthetic voices” – usually sampled, of course, from human voices – have zero intelligence. They’re digitally manipulated to sound as realistic as possible. They’re not thinking when they scan your book. Code is simply rendering text into pre-designed sounds.

By tossing the phrase AI around all over the place, many of the biggest advocates of machine-generated audio are doing themselves a disservice and not helping the publishing industry dispassionately consider its unvoiced backlist problem. The people who love those exclamation points are their own worst enemies, triggering knee-jerk objections with the implication that another kind of intelligence is coming to getcha!  Those marketing folks need to sit down, get over it, and quietly run a search and replace to put periods where all their exclamation points are.

So many things today are unnecessarily called artificial intelligence, processes that make no selections, have no prerogatives of their own, and certainly no consciousness. What we forget – what some never knew – is that artificial intelligence is defined as “the capability of a machine to imitate intelligent human behavior” (emphasis mine, and we’re getting that definition from Merriam-Webster’s Unabridged Dictionary). Popular usage – I’m looking at you, Hollywood – has morphed the term into something much more menacing than it really is.

Is it possible that the scare factor in popular speculation about AI could make it harder for publishing people to weigh the authentic advantages and disadvantages of synthetically generated audiobooks needed by the industry?

What do you think? Could the same logic behind changing the term UFO to UAP (unidentified aerial phenomenon) – getting us past the hype – help publishers, authors, and readers more rationally debate the question of how best to produce the audiobooks they need to be selling? 


About Porter Anderson

@Porter_Anderson is a recipient of London Book Fair's International Excellence Award for Trade Press Journalist of the Year. He is Editor-in-Chief of Publishing Perspectives, the international news medium of Frankfurt Book Fair New York. He co-founded The Hot Sheet, a newsletter for trade and indie authors, which now is owned and operated by Jane Friedman. Priors: The Bookseller's The FutureBook in London, CNN, CNN.com and CNN International–as well as the Village Voice, Dallas Times Herald, and the United Nations' WFP in Rome. PorterAndersonMedia.com

[url={url}]View the full article[/url]

AC Admin

Link to comment
Share on other sites

  • Replies 0
  • Created
  • Last Reply

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



WTF is Wrong With Stephen King?

  • Create New...