Like Henry Higgins, the phonetician from George Bernard Shaw’s play “Pygmalion,” Marius Cotescu and Georgi Tinchev lately demonstrated how their scholar was making an attempt to beat pronunciation difficulties.
The two information scientists, who work for Amazon in Europe, have been instructing Alexa, the corporate’s digital assistant. Their job: to assist Alexa grasp an Irish-accented English with assistance from synthetic intelligence and recordings from native audio system.
During the demonstration, Alexa spoke a couple of memorable night time out. “The party last night was great craic,” Alexa mentioned with a lilt, utilizing the Irish phrase for enjoyable. “We got ice cream on the way home, and we were happy out.”
Mr. Tinchev shook his head. Alexa had dropped the “r” in “party,” making the phrase sound flat, like pah-tee. Too British, he concluded.
The technologists are a part of a group at Amazon engaged on a difficult space of knowledge science often known as voice disentanglement. It’s a tough concern that has gained new relevance amid a wave of A.I. developments, with researchers believing the speech and expertise puzzle will help make A.I.-powered units, bots and speech synthesizers extra conversational — that’s, able to pulling off a mess of regional accents.
Tackling voice disentanglement entails way over greedy vocabulary and syntax. A speaker’s pitch, timbre and accent typically give phrases nuanced that means and emotional weight. Linguists name this language characteristic “prosody,” one thing machines have had a tough time mastering.
Only lately, due to advances in A.I., laptop chips and different {hardware}, have researchers made strides in cracking the voice disentanglement concern, reworking computer-generated speech into one thing extra pleasing to the ear.
Such work could ultimately converge with an explosion of “generative A.I.,” a expertise that permits chatbots to generate their very own responses, researchers mentioned. Chatbots like ChatGPT and Bard could sometime totally act on customers’ voice instructions and reply verbally. At the identical time, voice assistants like Alexa and Apple’s Siri will grow to be extra conversational, doubtlessly rekindling shopper curiosity in a tech phase that had seemingly stalled, analysts mentioned.
Getting voice assistants comparable to Alexa, Siri and Google Assistant to talk a number of languages has been an costly and protracted course of. Tech firms have employed voice actors to report a whole lot of hours of speech, which helped create artificial voices for digital assistants. Advanced A.I. techniques often known as “text-to-speech models” — as a result of they convert textual content to natural-sounding artificial speech — are simply starting to streamline this course of.
The expertise “is now able to create a human’s voice and synthetic audio based on a text input, in different languages, accents and dialects,” mentioned Marion Laboure, a senior strategist at Deutsche Bank Research.
Amazon has been beneath strain to catch as much as rivals like Microsoft and Google within the A.I. race. In April, Andy Jassy, Amazon’s chief govt, informed Wall Street analysts that the corporate deliberate to make Alexa “even more proactive and conversational” with the assistance of subtle generative A.I. And Rohit Prasad, Amazon’s head scientist for Alexa, informed CNBC in May that he noticed the voice assistant as a voice-enabled “instantly available, personal A.I.”
Irish Alexa made its industrial debut in November, after 9 months of coaching in comprehending an Irish accent after which talking it.
“Accent is different from language,” Mr. Prasad mentioned in an interview. A.I. applied sciences should study to extricate the accent from different elements of speech, comparable to tone and frequency, earlier than they’ll replicate the peculiarities of native dialects — as an example, possibly the “a” is flatter and “t’s” are pronounced extra forcibly.
These techniques should work out these patterns “so you can synthesize a whole new accent,” he mentioned. “That’s hard.”
Harder nonetheless was making an attempt to get the expertise to study a brand new accent largely by itself, from a different-sounding speech mannequin. That’s what Mr. Cotescu’s group tried in constructing Irish Alexa. They relied closely on an present speech mannequin of primarily British-English accents — with a much smaller vary of American, Canadian and Australian accents — to coach it to talk Irish English.
The group contended with numerous linguistic challenges of Irish English. The Irish are likely to drop the “h” in “th,” for instance, announcing the letters as a tough “t” or a “d,” making “bath” sound like “bat,” and even “bad.” Irish English can be rhotic, that means the “r” is overpronounced. That means the “r” in “party” will probably be extra distinct than what you would possibly hear out of a Londoner’s mouth. Alexa needed to study these speech options and grasp them.
Irish English, mentioned Mr. Cotescu, who’s Romanian and was the lead researcher on the Irish Alexa group, “is a hard one.”
The speech fashions that energy Alexa’s verbal abilities have been rising extra superior lately. In 2020, Amazon researchers taught Alexa to talk fluent Spanish from an English language-speaking mannequin.
Mr. Cotescu and the group noticed accents as the following frontier of Alexa’s speech capabilities. They designed Irish Alexa to rely extra on A.I. than on actors to construct up its speech mannequin. As a consequence, Irish Alexa was skilled on a comparatively small corpus — about 24 hours of recordings by voice actors who recited 2,000 utterances in Irish-accented English.
At the outset, when Amazon’s researchers fed the Irish recordings to the still-learning Irish Alexa, some bizarre issues occurred.
Letters and syllables often dropped out of the response. “S’s” generally caught collectively. A phrase or two, generally essential ones, have been inexplicably mumbled and incomprehensible. At least in a single case, Alexa’s feminine voice dropped just a few octaves, sounding extra masculine. Worse, the masculine voice sounded distinctly British, the form of goof that may increase eyebrows in some Irish houses.
“They are big black boxes,” Mr. Tinchev, a Bulgarian nationwide who’s Amazon’s lead scientist on the mission, mentioned of the speech fashions. “You have to have a lot of experimentation to tune them.”
That’s what the technologists did to right Alexa’s “party” gaffe. They disentangled the speech, phrase by phrase, phoneme (the smallest audible sliver of a phrase) by phoneme to pinpoint the place Alexa was slipping and fine-tune it. Then they fed Irish Alexa’s speech mannequin extra recorded voice information to right the mispronunciation.
The consequence: the “r” in “party” returned. But then the “p” disappeared.
So the information scientists went by means of the identical course of once more. They ultimately zeroed in on the phoneme that contained the lacking “p.” Then they fine-tuned the mannequin additional so the “p” sound returned and the “r” didn’t disappear. Alexa was lastly studying to talk like a Dubliner.
Two Irish linguists — Elaine Vaughan, who teaches on the University of Limerick, and Kate Tallon, a Ph.D scholar who works within the Phonetics and Speech Laboratory at Trinity College Dublin — have since given Irish Alexa’s accent excessive marks. The means Irish Alexa emphasised “r’s” and softened “t’s” caught out, they mentioned, and Amazon bought the accent as a complete proper.
“It sounds authentic to me,” Ms. Tallon mentioned.
Amazon’s researchers mentioned they have been gratified by the largely constructive suggestions. That their speech fashions disentangled the Irish accent so shortly gave them hope they might replicate accents elsewhere.
“We also plan to extend our methodology to accents of language other than English,” they wrote in a January analysis paper concerning the Irish Alexa mission.
Source: www.nytimes.com