ATU293 – Vocal ID with Dr Rupal Patel


ATU logo

Your weekly dose of information that keeps you up to date on the latest developments in the field of technology designed to assist people with disabilities and special needs.

Vocal ID with Dr Rupal Patel |

The AppleVis Community Names the Apps and Developers that were its Golden Apples of 2016 | AppleVis
App: ProloQuo2Go
If you have an AT question, leave us a voice mail at: 317-721-7124 or email
Check out our web site:
Follow us on Twitter: @INDATAproject
Like us on Facebook:
——-transcript follows ——

  1. RUPAL PATEL: Hi, this is Rupal Patel, and I’m the founder and CEO of Vocal ID, and this is your Assistance Technology Update.

WADE WINGLER:  Hi, this is Wade Wingler with the INDATA Project at Easter Seals crossroads in Indiana with your Assistive Technology Update, a weekly dose of information that keeps you up-to-date on the latest developments in the field of technology designed to assist people with disabilities and special needs.

Welcome to episode number 293 of assistive technology update. It’s scheduled to be released on January 6, 2016.

Happy new year everybody. We are back from our special formats back to our regular format of news stories, interviews, and up reviews. Today we are excited to have Dr. Rupal Patel who is the founder and CEO of vocal ID. They are doing some pretty cool stuff when it comes to voice files and more natural sounding voices.

We’ve got an app review from BridgingApps about Proloquo2Go. We hope you’ll check out our website at, give us a call on our listener line at 317-721-7124, or shoot us a note on Twitter @INDATAproject.


From our partners over at AppleVis, we have the 2016 Golden Apple awards. Since 2012 they have been awarding developers of apps that are particularly accessible or useful for folks who use screen readers.

In the category of Best iOS app, they have the Voice Dream Reader which is an app that reads articles, documents, and books aloud. I use it; it’s pretty great.

In the Best iOS game category, they have Time Crest: The Door. It’s an epic fantasy about Ash Eldon, and you have to scramble to solve a puzzle about who hurls meteors to destroy Ash’s world of Atlancia.

In the category of Best Assistant iOS app is the KNFB Reader that turns printed text into high-quality speech.

The Best Mac App is A Blind Legend, all about the famous Edward Blake, the blind night guided by his daughter Louise and finding a way to avoid traps in the Highcastle kingdom.

In the category of Developer of the Year, there is a tie:  American Printing House for the Blind and Sneaky Crab. Both got the same number of votes with American Printing House and their Nearby Explorer app and Sneaky Crab doing Time Crest: The Door.

We love the folks over at AppleVis and support what they are doing. I would encourage you to check our show notes. We’ve got a link over to the full listing of the Golden Apple awards and you can also go there and find all crack kinds of great stuff including their blog, podcast, and all kinds of information about iOS and Mac OS accessibility. Check our show notes.


Each week one of our partners tells us what’s happening in the ever-changing world of apps, so here’s an app worth mentioning.

AMY BARRY:  This is Amy Barry with BridgingApps and this is an App Worth Mentioning. I’m really excited about this week’s app called Proloquo2Go. It is one of our favorite AAC apps here at BridgingApps. In fact, I was really surprised when I found out that we had not shared it yet. Proloquo2Go is a full-featured augmentative communication app that offers picture-only, picture and text, and keyboard options for message formulation. We currently use Proloquo2Go with users ages four through adult. The voices are available are a more natural funding male/female adult or child and can be swapped for British or Indian accented English at no additional cost. We really like Proloquo2Go’s American children voices Josh and Ella. They are dramatically improved over voices previously. They have more feeling and varying intonation behind them. The keyboard and picture text grades can be used for novel sentence building but they can require many page navigations in order to successfully build a basic or complex sentence. None of our users have had any problems with that in the past and have all been very successful using this app. The images on the buttons are simple sticks although you are able to use real pictures which we highly encourage. Some of the pictures are more lifelike cartoon drawings. The keyboard also provides word prediction but it does require the user to have fair to good fine motor coordination in order to accurately access and utilize these word production keys.

This app has the potential to provide very functional communication, and that is why we highly recommend it. The cost is most definitely higher in comparison with some of the other apps; however, it leaves a lot of room for growth for those users whose receptive and acceptive language and can negation skills are proving that’s what it can be used with an early learner, young child, all the way up through adulthood. And as the accessibility increases, it may prevent needing to purchase and transition to another app. Proloquo2Go is available for $249.99 at the iTunes Store and is compatible with the iPad, iPhone, and Apple Watch app for iPhone.

For more information on this app and others like it, visit


WADE WINGLER:  The topic of augmentative and alternative communication, or synthesized digitized speech, all those are topics we have talked about in a number of different times and situations on this show. But today we are going to talk about something that’s new and different and frankly I think kind of exciting and maybe even a little bit charming. I’m interested today and excited to have Dr. Ruble Patel on the show who is the founder and CEO of vocal ID, which I recently learned about the folks over at tobii DynaVox. Before we go on, Dr. Patel, are you still on the line?

  1. RUPAL PATEL: I am.

WADE WINGLER:  I’m so excited and happy to have you on the show. Thank you for taking time out of your busy day to talk with us a little bit.

  1. RUPAL PATEL: Thank you. I’m happy to be here.

WADE WINGLER:  Dr. Patel, can you tell me a little bit about your background and how you have developed a career that brought you to the point where you are working in a project like vocal ID?  And then we are going to get into the what is it questions and have you explain it to us. But tell us about yourself first.

  1. RUPAL PATEL: Sure. I am a speech pathologist and speech technologist. I’m a professor at Northeastern University. Vocal ID was initially a research project in my lab for many years from 2007 and on it got to a point where the technology was ready to take it out of the laboratory so that we could help hundreds and thousands of people rather than the handful we can do in the laboratory. That’s the long story of how we got here.

WADE WINGLER:  Excellent. I’ve been familiar with augmentative communication systems for many years and I know that a lot of listeners in my audience are as well. When I think about AugCom, I think about synthesized voices primarily. In fact there are some stereotypes about those synthesized voices. I’ve even seen them represented at cartoons like The Simpsons and things like that. That synthesized speech has for me been part of the experience and part of the context of augmentative communication. Talk to me a little bit about that and then tell me some things about vocal ID and the customization and why this is important.

  1. RUPAL PATEL: I think augmentative communication, the primary focus until now because of the medical models has been on getting us a functional solution to the lack of the ability to speak. But as we start to think about adoption of AAC. When you think about what it means for someone to live with an AAC all their life and for their identity growing up on how they use their device, I think we are at a point in technology right now where we can do more. In fact we need to do more. We are giving them a method of communication but we are not fully embracing what it means to communicate, which isn’t just the information that we convey, it’s how we convey it and how others get to know us to the way the we interact with them. I think that the fact this satire and jokes about how robotic peoples voices can sound, and Stephen Hawking is our prototypical person we think of when we think of an augmentative negation user, I think we are robbing people of the fact that each of these individuals has an identity and a voice that can be conveyed through the device. Imagine you’re in a classroom full of kids who are using devices. When you’re not looking at them, you don’t actually know who it is that is speaking because many people may use the exact same voice. Sometimes even some people are using an adult male voice if there are little girl. This initial observation was really what got me to think about can we not build voices for these individuals that are unique to them that can grow with them and can change with them and can really reflect who they are. Our voices are a huge part of our identity and it should be the same for people who are AAC users as well.

WADE WINGLER:  That makes sense. Until this very moment I hadn’t really thought about sort of the nuance and importance of that. Tell me about vocal ID, what it is and how the process works.

  1. RUPAL PATEL: Vocal ID is a voice company. What we do is we create custom digital voices. The way we do it is that we discovered that even those individuals who are unable to speak clearly can still make some sound. Most AAC users use a total communication based method. They use their device, gestures, whatever vocalizations they can make. What we do is we take that vocalization they can still make and then we crowd sourced the collection of voices. We ask people who are helping talkers from around the world to contribute their voices to the voice bank we are growing. From that voice bank we can find a voice that is similar to them in age and gender and so on, and we can combine a sample of voice we got from the people we need to build a voice from and to a matching speech donor and create a digital voice that sounds as clear as their donor but is as unique in vocal identity to the person who’s going to use it.

The technology has a lot to do with signal processing and communicating with this unique voice copper in the and the application is one where it’s really all about how they feel about using this voice and how they can be heard as themselves.

WADE WINGLER:  As we are having this conversation, I am listening to you. I’m having an image, and you’re going to tell me that I’m wrong, maybe, and I’m totally okay with that. I’m flashing back to images of the bionic man from my childhood in the 70s when it was a portion of the man’s body but with some augmentation from others kind of happening. How far off MI when I make that leap?

  1. RUPAL PATEL: I like that analogy. It makes them superpowered AAC users. We are using a simple theory of speech production. Typically when we produce speech, all of us, we make some sound that is created in our voice box and that sound gets pushed to our vocal track and the way we shape our tongues and mouth to create consonants and vowels. The fact that we are re-creating a voice digitally means we can take a sound source from the person who is not speaking and borrow the filter, the rest of the vocal track, from someone who is able-bodied and can speak and combine them digitally. It is like that bionic voice if you want to call it. The great thing about this human powered voice is that you can have a variety of voices so you don’t have to sound like that robotic voice that you’re talking about.

WADE WINGLER:  Talk to me a little bit about the resulting voice. How realistic is it and what does it sound like?

  1. RUPAL PATEL: It sounds like a combination of the two. You are definitely going to hear the donor voice but you are also going to hear it infused by this vocalization that the individual can’t speak makes. We don’t want the distortions of the nonspeaking individual voice copy want the essence of who they are. We call it their vocal DNA. You hear this combination of the two together. Maybe later in the session I can find a sound file and play it for you so you get a chance to hear it. There’s also a way to secure it on our website as well


WADE WINGLER:  I’m going to interrupt here for just a second because Dr. Patel did send me a sample some going to drop it in and that you hear it and we will jump it back into the interview.

SPEAKER:  My name is Maeve and I am 10 years old. I have cerebral palsy so I need my device to speak, but I understand everything. I am so happy to have my new voice. It sounds like a kid and my voice is neat. All my friends and my sister thinks my voice is totally awesome.


  1. RUPAL PATEL: The voice still sounds like a synthesized voice. We can’t do vehicles and make it sound exactly like you and I are sounding today because we are also trying to create this voice and a very scalable and affordable way. Until we came along, it required a voice actor to record thousands of sentences in a sound studio, and then a team of engineers and linguists for over those recordings and create this voice. It’s hundreds of thousands of dollars. What we are trying to do is create this voice for about $1000. One of the ways we do that is we crowd sourced. We ask everyday people to donate their voice. That’s what keeps the cost down. We also have automated a lot of different things about how we collect the voices including the fact that people can donate their voices from their home to bring that barrier of entry down. That creates other issues like they sometimes record and not the best recording settings, like when the TV is on in the background, so we have to clean the audio.

The more and more we process the audio, the less realistic it’s going to be. It costs millions of dollars to build a Siri voice, but in the end you’ve got one generic white female a voice. We are trying to make thousands of voices for people quickly and cheaply, so there is a trade-off. We believe and our early adopters don’t see a big difference in the voices we are creating that are authentic to them and the ones they can get generically out-of-the-box. The coolest thing here is we have created several options for many people that we first built voices for, our early adopters. Only two out of the seven cases the people choose the one that was objectively the most understandable voice. Many times people chose the voices that had a little bit of — I don’t know how to call it — a nuance in their voice. They cracked a little bit. It didn’t always sound perfect. We have to remember they are using these voices not to do a radio announcement. They are using these voices as their way of communicating with people. They want to sound like us. They want to sound like everyday people communicating.

WADE WINGLER:  Authenticity.

  1. RUPAL PATEL: Yeah.

WADE WINGLER:  Dr. Patel, if somebody wanted to contribute their voices, is that something our listeners could actively do if they wanted to volunteer?

  1. RUPAL PATEL: Absolutely. The website address is We have 18,000 people who have started doing the recording process. Some of them have finished it but everyone has because it is a time-consuming process. But without good quality voices, we can’t build voice for AAC users. Even if I had thousands of voices in one area of the world, we still need a variety of ages, accents. This really allows us to have this diversity of voices in AAC for the very first time. I encourage people to take the time to think about sharing their voice. It’s such an incredible experience to be able to share this gift that we all have and take for granted with someone who can’t speak. Kids in high school and middle school are often doing these activities for community service as well as disability awareness. The more we can get people involved in this and push the barrier here, the better voices are going to get as well for our end customers.

WADE WINGLER:  And how long does it take if somebody is going to contribute their voice?  What is the time commitment?

  1. RUPAL PATEL: It depends a little bit on the individuals that are technologically know how. Some people tend to have more difficulty just using a web-based interface. Usually this is the older individuals. Younger people tend to listen to their samples a lot more. My daughter would listen to every recording that she made and that would take her long. On average people take somewhere around 5 to 7 hours to complete the full recording set which is done in a 20 minute chunks. A couple weeks, 20 minutes a day, you would be done.

WADE WINGLER:  That sounds very doable. You are building voices. Once a voice is created for an individual, what platforms do those voices go on?  I know there are different devices.

  1. RUPAL PATEL: We have created our voices so that they are compatible with all three platforms, Windows, iOS, and Android. We are working with leading assistive technology companies to have all of our voices compatible with all of the platforms. Regardless of what kind of device you are using, you can use our voice. In the case of iOS, the company has to integrate our engine into their device so there we have only a handful of iOS apps that we are working with just because at the state we are right now, we can’t integrate with every single company. The likelihood that we will get there soon is high.

WADE WINGLER:  Tell me about the response that you’ve gotten from some of your users and their families. What are people saying about this?

  1. RUPAL PATEL: I think that we are seeing things like increased communication participation, three hundred times more communication. They feel much more like these voices are representative of them. It’s not just for that person, but it’s also for family members speaking. He now has a voice, let them talk, right? It’s both figurative as actual. The other thing is we see that teachers tell us that these children self-esteem level is greater. In terms of adults who have lost their voice and didn’t bank the voice before they lost it and now have their own voice to speak, it just sort of changes relationships. It changes the way they continue to live with dignity and as themselves. The implications are everything from objective measures you can count like the number of times they are communicating, to the relationships they are having and the fulfillment of those relationships. One thing we would love to look at is things around anxiety and communication with unfamiliar individuals. One of our earlier adopters now has a job. She uses her AAC device. She primarily used signing which even though she had an AAC device for many years. Now, because the fact that the voice comes out of her device doesn’t feel for them, she feels that she can enter into communication scenarios that are foreign to her, were not really accessible to her before, because she had to have someone who could understand sign before. Now anyone can listen to the device and can understand what she’s saying. It really can open up doors. This is the kind of evidence we need to start building a case for the voices being funded as well.

WADE WINGLER:  That makes sense. Tell me how far along in the process you are and what kinds of things are in the future for vocal ID.

  1. RUPAL PATEL: We had our first seven voices that we built last year in 2015 and delivered them. What we were doing was when we were in the laboratory, it took us on the order of 40 to 50 hours beforehand to build each voice in terms of manual labor time. That’s a very expensive process. That would not even be accessible to the average individual. But we did over the last year and a half is take that and start to automate parts of the process so it takes us less manual labor time. It’s a time-consuming, expensive process to build the voice. We brought the cost down to several orders of magnitude less than it used to be, but it still cost $1,250 for someone to purchase a voice today, which is heavily subsidized because we have federal government grants right now to get this product to market. What we are trying to do is continue to do more automation so that we can make these voices available to many people, but the cost, I don’t think is something that is going to go down in the near future just because it is a very time-consuming process, no matter what we do. What we need to do in an actual event is have more consumers starting to use this to show the impact and gated out of various different platforms. We launched the service officially in August 2016 cost we have about 100 customers right now. We have a little bit of a backlog where we are trying to get those voices built out. Then over time I hope we will have many more customers who can continue to purchase their voices and allow us to get this to market fully.

WADE WINGLER:  What’s the relationship with tobii DynaVox?

  1. RUPAL PATEL: We have a strategic partnership with tobii DynaVox in which they are working with us to both market the product as well as sell the product. We need to get many users to use vocal ID voices to continue to bring this great innovation to market.

WADE WINGLER:  I know you have a meeting to go to and we are getting close to the end of the interview. Tell me a story about somebody’s life who is really been impacted before we wrap up here.

  1. RUPAL PATEL: I don’t know which story to tell you. I’ll tell you a story of a teenager. He was part of a basketball team at the University. The day that he received his voice, he wanted his entire team that he is part of to be there to hear his voice. It’s interesting. When you see the reaction of the kids on the team and also this young man and how he reacted to his voice, the kinds of things we heard were, wow, it sounds so passionate like he is, or it’s the first time we hear who he is, things like that. Almost a year later, we recently had this interview with several team members and this young man. We are finding he is taking more communication risks in positive ways. He’s interacting in different ways. People are seeing him for who he is. When someone is it augmentative communication user with a physical disability and communication impairment, people don’t always know how much is in that individual cognitively, how much they are capable of and what their physical impairment is holding them back from. I think this is opening up, having your own voice that reflects who you are, is one way to start telling the world I am here and I have a voice and you’re going to listen. I think that’s a really exciting implication of what we are doing.

WADE WINGLER:  Amazing stuff. Dr. Patel, give us some contact information as we wrap up here.

  1. RUPAL PATEL: You can reach me at, as well as if you have any questions about vocal ID or products, go to our website at You will find hopefully examples of what we are doing. There are plenty of stories as well as videos online that will help you understand what the product is and its impact as well.

WADE WINGLER:  Dr. Rupal Patel is the founder and CEO of Vocal ID and has been our most delightful guest today. Thank you for being with us.

  1. RUPAL PATEL: Thank you so much.

WADE WINGLER:  Do you have a question about assistive technology? Do you have a suggestion for someone we should interview on Assistive Technology Update? Call our listener line at 317-721-7124, shoot us a note on Twitter @INDATAProject, or check us out on Facebook. Looking for a transcript or show notes from today’s show? Head on over to Assistive Technology Update is a proud member of the Accessibility Channel. Find more shows like this plus much more over at That was your Assistance Technology Update. I’m Wade Wingler with the INDATA Project at Easter Seals Crossroads in Indiana.

***Transcript provided by TJ Cortopassi.  For transcription requests and inquiries, contact***