Improving Voice Recognition for People with Speech Disabilities
Summary: A new study shows that automatic speech recognition (ASR) systems trained on speech from people with Parkinson’s disease are 30% more accurate in transcribing similar speech patterns. Researchers collected over 151 hours of recordings from participants with varying degrees of dysarthria, a speech disorder common in Parkinson’s patients, and used the data to train ASR systems.
The study reveals that incorporating atypical speech samples significantly improves voice recognition technology for those with speech disabilities. These findings could help make voice-controlled devices more accessible to people with neuromotor disorders.
Key Facts:
- ASR systems trained on Parkinson’s speech improved transcription accuracy by 30%.
- The study collected 151 hours of recordings from people with dysarthria.
- These findings could enhance accessibility for users with speech disabilities.
Source: Beckman Institute
As Mark Hasegawa-Johnson combed through data from his latest project, he was pleasantly surprised to uncover a recipe for Eggs Florentine. Sifting through hundreds of hours of recorded speech will unearth a treasure or two, he said.
Hasegawa-Johnson leads the Speech Accessibility Project, an initiative at the University of Illinois Urbana-Champaign to make voice recognition devices more useful for people with speech disabilities.
In the project’s first published study, researchers asked an automatic speech recognizer to listen to 151 hours — almost six-and-a-half days — of recordings from people with speech disabilities related to Parkinson’s disease. Their model transcribed a new dataset of similar recordings with 30% more accuracy than a control model that had not listened to people with Parkinson’s disease.
This study appears in the Journal of Speech, Language, and Hearing Research. The speech recordings used in the study are freely available to researchers, nonprofits and companies looking to improve their voice recognition devices.
“Our results suggest that a large database of atypical speech can significantly improve speech technology for people with disabilities,” said Hasegawa-Johnson, a professor of electrical and computer engineering at Illinois and a researcher at the university’s Beckman Institute for Advanced Science and Technology, where the project is housed.
“I look forward to seeing how other organizations will use this data to make voice recognition devices more inclusive.”
Machines like smartphones and virtual assistants use automatic speech recognition to make meaning from vocalizations, allowing people to queue up a playlist, dictate hands-free messages, seamlessly participate in virtual meetings and communicate clearly with friends and family members.
Voice recognition technology does not work well for everyone; in particular, those with neuromotor disorders like Parkinson’s disease that can cause a range of strained, slurred or discoordinated speech patterns, collectively called dysarthria.
“Unfortunately, this means that many people who need voice-controlled devices the most may encounter the most difficulty in using them well,” Hasegawa-Johnson said.
“We know from existing research that if you train an ASR on someone’s voice, it will begin to understand them more accurately. We asked: can you train an automatic speech recognizer to understand people with dysarthria from Parkinson’s by exposing it to a small group of people with similar speech patterns?”
Hasegawa-Johnson and his colleagues recruited about 250 adults with varying degrees of dysarthria related to Parkinson’s disease. Prior to joining the study, prospective participants met with a speech-language pathologist who evaluated their eligibility.
“Many people who have struggled with a communication disorder for a long time, especially a progressive one, may withdraw from daily communication,” said Clarion Mendes, a speech-language pathologist on the team. “They might share their unique thoughts, needs and ideas less and less often, thinking their communication is just too impacted to engage in meaningful conversations.
“Those are the exact people we’re looking for,” she said.
Selected participants used their personal computers and smartphones to submit voice recordings. Working at their own pace and with optional assistance from a caregiver, they repeated well-worn vocal commands like “Set an alarm,” recited passages from novels and opined on open-ended prompts like “Please explain the steps to making breakfast for four people.”
Responding to the latter, one participant enumerated the steps to make Eggs Florentine — Hollandaise sauce and all — while another pragmatically advised to order takeout.
“We’ve heard from many participants who have said that the participation process was not only enjoyable, but that it gave them the confidence to communicate with their families again,” Mendes said. “This project has brought hope, excitement and energy — uniquely human qualities — to many of our participants and their loved ones.”
She said the team consulted with Parkinson’s disease experts and community members to develop content relevant to participants’ lives. Prompts were specific and spontaneous: training a speech algorithm to recognize medication names, for example, may help an end user communicate with their pharmacy, while casual conversation-starters mimic the cadence of daily chit-chat.
“We tell participants: We know that you can make your speech clearer by putting all your effort into it, but you’re probably tired of having to try to make yourself understood for the benefit of others. Try to relax and communicate as if you’re chatting with your family on the couch,” Mendes said.
To gauge how well the speech algorithm listened and learned, the researchers divided the samples into three sets. The first set of 190 participants, or 151 recorded hours, trained the model.
As its performance improved, the researchers confirmed that the model was learning in earnest (and not just memorizing participants’ responses) by introducing it to a second, smaller set of recordings. When the model reached peak performance on the second set, the researchers challenged it with the test set.
Members of the research team manually transcribed an average of 400 recordings per participant to check the model’s work.
They found that after listening to the training set, the ASR system transcribed recordings from the test set with a word error rate of 23.69%. For comparison, a system trained on speech samples from people without Parkinson’s disease transcribed the test set with a word error rate of 36.3% — roughly 30% less accurate.
Error rates also decreased for almost all individuals in the test set. Even speakers with less typical Parkinsonian speech, like unusually fast speech or stuttering, experienced modest improvements.
“I was excited to see such a dramatic benefit,” Hasegawa-Johnson said.
He added that his enthusiasm is bolstered by participant feedback:
“I spoke with a participant who was interested in the future of this technology,” he said. “That’s the wonderful thing about this project: seeing how excited people can be about the possibility that their smart speakers and their cell phones will understand them. That’s really what we’re trying to do.”
Funding: Research described in this press release is supported by Amazon, Apple, Google, Meta and Microsoft; the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under award no. R13DC003383; and the National Science Foundation under award no. 1725729.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.