Your weekly dose of information that keeps you up to date on the latest developments in the field of technology designed to assist people with disabilities and special needs.
Microsoft Translator with Will Lewis – Principal Technical Program Manager
If you have an AT question, leave us a voice mail at: 317-721-7124 or email email@example.com
Check out our web site: https://www.eastersealstech.com
Follow us on Twitter: @INDATAproject
Like us on Facebook: www.Facebook.com/INDATA
——-transcript follows ——
WILL LEWIS: Hi, this is Will Lewis, and I’m the Principal Technical Program Manager with Microsoft Translator, and this is your Assistive Technology Update.
WADE WINGLER: Hi, this is Wade Wingler with the INDATA Project at Easter Seals Crossroads in Indiana with your Assistive Technology Update, a weekly dose of information that keeps you up-to-date on the latest developments in the field of technology designed to assist people with disabilities and special needs. Welcome to episode number 371 of Assistive Technology Update. It’s scheduled to be released on July 6, 2018.
Today I am super excited to have an interview that takes the whole show with Will Lewis who is the Principal Technical Program Manager for Microsoft Translator, amazing technology. We hope will check out our website at EasterSealsTech.com. Sent us a note on Twitter@INDATA Project. Or call our listener line; the number is 317-721-7124.
Language matters. We say that and mean that, and in fact right now we are interacting exclusively through language. It’s what we do as human beings. In fact, we can’t get throughout our days without language. Every once in a while, more often than not, the language I’m using doesn’t always align with the language that others might be using. Sometimes we need translation or interpreting. Of course, the world of assistive technology isn’t exempt from this language misalignment phenomenon that happen sometimes. Today we are going to dig into some interesting technology called Microsoft translator.
I’m so excited that Will Lewis, who is the Principal Technical Program Manager with Microsoft Translator, is with us today to try to help us sort out what this is and how it works and how it relates the world of assistive technology. That’s enough of me and my rambling language. Will, welcome to the show.
WILL LEWIS: Thank you very much for having me.
WADE WINGLER: I’m excited about our conversation today. I know there is a lot going on in the world of Microsoft Translator, and we are going to get to some of those details. Before we start talking about the technical side of our conversation, I want to know more about you personally, and in our audience does too. Give us a little bit about yourself and the journey you’ve taken to get to your current role with Microsoft.
WILL LEWIS: I certainly will. I ran a small business years ago doing software development. Then I went back to university and got a degree in linguistics, and I went on to a Masters in linguistics and then a PhD in linguistics. I have a doctorate in linguistics. I was a professor for five years at a couple of different universities. I had heard about this position at Microsoft while I was a professor at the University of Washington at the competition linguistics program. I joined the team way back in 2007 just as the product was launching.
WADE WINGLER: So there’s a whole lot of linguistics stuff, so this is turned into the perfect project for you, right?
WILL LEWIS: It’s been an amazing journey. Never in my wildest dreams would’ve thought I would be working on machine translation. It’s been 11 years and counting, and I love it.
WADE WINGLER: That’s great. Let’s start with just the basics. What is Microsoft translator?
WILL LEWIS: That’s a really good question. Microsoft Translator is a cloud service, a multilingual statistical machine translation cloud service. We provide this service for people that want to do machine translation, translate text or speech from one in which into another. And over time, it’s evolved. Initially we had this service that you could use, go to a web browser and translate content. There was an API where you could write an application that would use our service. All that is still available. What we’ve also provided are apps. We have apps on iPhone and Android phones as well as on Windows devices. It also runs in the browser. It allows you to do speech or text translation on the device. We also support multilanguage engagements as well where you can connect the devices. I can start a conversation on my Android phone and connect to someone else’s iPhone. That person can connect on their PC and we can engage with each other, not just in one or two languages but in multiple languages. I’ve had conversation where there’ve been five or six people all talking from their devices, speaking five or six or even more different languages.
WADE WINGLER: Wait a minute, this is real time translation happening between multiple individuals, multiple languages?
WILL LEWIS: That’s a copyright. It’s real time. I can speak in English. It will be generating transcript in English on my phone. If someone else is connected on their phone in German to my device, to basically this conversation, they will see the transcripts in German on their phone. When they speak in German, they will see the transcript in German, I will see them in English. Someone else could join in Japanese. They will see Japanese centric captions. The rest of us will see it on our respective languages, all life, Realtime.
WADE WINGLER: Both my nerdy Star Trek and Douglas Adams fandom are popping up as we have the Babel fish and the universal translator. Am I far off with that analogy?
WILL LEWIS: I wouldn’t go as far as a universal translator. It’s a nice analogy. I really like it. I go back to the Star Trek episode with Captain Kirk and the Gorn. It’s remarkably familiar. They have these handheld devices that they are talking into that do translation. That’s it. I’ve had people say to me, oh, I saw that on Star Trek 50 years ago. I’m like, wait a minute, Star Trek is supposed to be set 200 years in the future. We are actually 150 years ahead of schedule.
WADE WINGLER: That’s awesome. That’s funny. We kind of hinted around this, but I don’t want to assume we’ve covered it. Why do we need such a thing like Microsoft Translator? Why is this important?
WILL LEWIS: I think language is what makes us human. Language is the one characteristic of humanity that sets us apart from just about every animal on the planet. It’s also part of our individual identities, what language we speak, how we communicate. Language barriers are these perennial problems. It’s something that has existed since biblical times, more or less, and well before as well. How do we communicate with someone who doesn’t speak our language? It is always trying. You have someone who is visiting from China, they don’t speak English. How do you communicate with that person? Being able to break down those barriers to provide some form of access for the individual to be able to communicate with each other is really where this technology really comes to the four. It makes this — it breaks down those barriers in ways that are really engaging and touching.
WADE WINGLER: As you describe that, this kind of an emotional component that goes with that. I know just enough Spanish to order a meal and just enough sign language to carry out some basic communication. But when I engage others in that language where my fluency is pretty low, I get a feeling where I’m quite not helping you the way I want to, or I’m not quite living up to my side of this engagement right now. That’s part of it, right?
WILL LEWIS: It’s also you want to be included. If you are in that situation, you are being left out. It’s not because you’ve done anything wrong. It’s just that you don’t happen to speak that language. Being able to engage with someone who speaks a language you don’t speak is really gratifying. It opens doors to worlds that you wouldn’t have access to otherwise. I’ve had the most remarkable conversations myself with people who speak Arabic or Chinese or Portuguese, where I would not have been able to engage with them at all. Maybe we would’ve been flapping our arms a little bit or whatever, but we wouldn’t have been able to engage with each other. To sit down and have a conversation about their families or what about what they do for work or what it’s like where they live, that is just really engaging and amazing technology.
WADE WINGLER: Such a human experience.
WILL LEWIS: Absolutely.
WADE WINGLER: Microsoft translator, I’m thinking, was not developed as a disability or assistive delighted tool, right? It didn’t start out that way? Or it’s not primarily that?
WILL LEWIS: It did not, no. We kind of fell into that to be frank. We work on machine translation, which is translating between languages that people speak. We had not thought about the AT users at all. There is an interesting back story. Involves me personally. There is an individual I work with fairly closely — we’ve worked on a couple of projects together — by the name of Ted Hart. He is profoundly deaf. He went deaf when he was 14 years old. When we were first developing this technology, it was being integrated into Skype so that you could use Skype to make translated phone calls. You could call someone else in a different part of the world and have it translate for you, which is really cool. We were asking for people to help us “dog food” this. This is before we had even announced it. It was completely internal. He came to my office and said I could use this. I’m like, that’s great. He insisted. He said, if you provide this and it has a transcript in there, I can use this. I can call my sister in Australia. I can call my wife at home and have phone calls with them. What they say will be transcribed. We hadn’t even thought about that. It’s like, oh, of course. We are transcribing the audio, so when someone speaks, we are providing transcript. They we translate that transcript. We hadn’t even thought about the fact that we could provide just that monolingual aspect of it, just the transcript are useful in and of themselves.
Part of this is the development of technology called true text that breaks the output into sentence-like chunks. So when someone is speaking, they just talk and talk. There are no sentence boundaries, no breaks. There are dysfluencies, restarts, all sorts of things that make it difficult to read. One of the things we did to assist machine translation is to break it into output that is more sentence like and get rid of all the dysfluencies. It makes it much more readable. It ends up looking like captions. We hadn’t even thought of it. It’s not something we had considered and we just happen upon it. Ted saw it.
WADE WINGLER: Tell me the story but the pub we talked about in the preinterview.
WILL LEWIS: There is a group of us that go to a pub here on campus probably every other Friday. Ted would never go with us because what’s he going to do? Sit around and watch us talk? It’s boring for him. When we developed the apps, not it is required. If you go to the pub, and Ted is coming along, you’ll have to bring your phone because y’all have a talk on the phones. We are all sitting around talking on our phone, and Ted is able to engage with all of us. For him, it has completely changed his relationship with us. The second time we had done this, the second time gone to the pub doing this, he pulled me aside at the word. We were the last two people to leave. He looked at me — he’s this big football player guy. He kind of looked at me and got choked up and said, I have forgotten what conversations like that were like. I was just like, wow, I had no idea it was changing his life in that way. I get choked up just talk about it.
WADE WINGLER: Wow. That’s amazing. So we’re talking then about somebody who is deaf benefited from this. But are there other groups of people with disabilities? I’m thinking learning disabilities, right?
WILL LEWIS: We are seeing use in education and various places where folks that are dyslexic will use this to help them, or people with ADHD where they sometimes will drift off and not be paying attention to a lecture or discussion, and they can go back and look at the transcript and see what they missed. It really does help a lot. Some people are more visually focused than orally focused, so they will basically not be listening as well but can read a lot better. It reinforces that as well.
WADE WINGLER: That totally resonates with me. I’m a fan of the online learning experience. I do the massive open online courses and things like that. A lot of those are videos that are transcribed. When you say ADHD, there are times when I drift off and jump back and look at the transcript that is running while I am in a class. That totally resonates with me. It makes a lot of sense.
WILL LEWIS: What we’ve seen is, we are using this in a classroom and a number of schools across the country as well as in universities. One of the big universities we are working with that uses this technology quite a bit in the Rochester Institute of Technology. They have a deaf and hard of hearing population that numbers around 1300, or about 18 percent of their student population is deaf or hard of hearing. They have very good services for folks that are deaf or hard of hearing. With interpreters and captionist and all this. They provide very good services. But they have 600 classes per term. They want to be able to provide access, even when someone hasn’t set up in advance a interpreter or captionist, so you have this kind of running. What we found where this is running, there is one instance that happened this last semester where the student who had requested it had dropped out of the class. There were 25 remaining students in the class. They were all hearing. The professor said, oh, I’m going to turn this off because so-and-so is not here anymore. The hearing students about revolted. Don’t turn it off! We are using it! I think that gets to this universal aspect. We developed a technology — even though we happened upon it accidentally — developing this technology with people who are deaf or hard of hearing in mind, but at the same time it is benefiting people that are hearing and sometimes significantly.
WADE WINGLER: Absolutely. That whole concept of universal design. Curb cuts and ramps were made for wheelchair use primarily, but everybody uses those with strollers and skateboards and automatic doors. Everybody uses those.
WILL LEWIS: Exactly. It’s like closed captioning.
WADE WINGLER: Yep.
WILL LEWIS: Most people who use closed captioning are actually hearing people at sports bars and Jim’s and things like that. It was designed for people who are deaf or hard of hearing, but guess what, it benefits everybody.
WADE WINGLER: Let’s talk a little bit about the user experience. We’ve talked a little bit about this, but let’s say I’m a student at RIT. I’m deaf and I walk into the classroom, I realize I need some captions or transcript. I pull out my smart phone, laptop? How does that go?
WILL LEWIS: In the classroom scenario, the professor needs to be using the service. It has to be capturing the audio in some way. You can’t hold of your phone and expect it to get a good quality signal. It has to be something where the professor is talking into a microphone. They would have to preset that up. But the advantage of that, once it is running — and it can be run anywhere. You just start the session in PowerPoint or any browser, and it is capturing the audio. Then anyone can join the session from any device. But impromptu conversation can happen too. We are seeing this at Rochester. There is a study being done at Gallaudet this summer where they have five hearing students and five deaf students. Most of time they have interpreters, but when they don’t, they fall back to the app and are able to start a conversation on the app. Someone who is hearing starts to come session on the app, someone else joins or any number of individuals join the conversation. They see their transcript on their local device. In the classroom as a professor is presenting, the transcript are being generated and someone can walk in and say oh, I want to see that transcript on my device. They joined that conversation.
Any number of individuals can join. I think we have some arbitrary limit of 100 individuals in a conversation, but that’s arbitrary. We’ve had instances where we’ve had 500 people join. I think in one instance we had 500 people — not quite 500. It was 200 and something people join a conversation. Some of them were deaf and hard of hearing. Many spoke of the languages. I think we had 34 languages that were being transcribed for that session.
WADE WINGLER: That’s remarkable. My question was about the user experience, and that sounds amazing. My next question is what is going on behind the scenes? I know this the where you spend a lot of your time. What is going on in the cloud or behind the scenes? There is lots of magic happening.
WILL LEWIS: That’s a good question. What happens is our service will take the audio that is being generated. So if someone is speaking into their phone or microphone or some other device that is capturing the audio. That audio gets stream to our cloud service, and it then takes that, runs it against what we call models. These are acoustic models that generate transcripts. These acoustic models are trained on thousands of hours of content, content of people speaking where we had transcripts, and people speaking all sorts of different dialects of English, different accents. It’s this very large model that sits in the cloud that transcribed this content, generates a transcript, send it back to the local device.
Now if individuals are joined on other devices to that same conversation, and they are joined in a different language, the service than knows, okay, I have a device that is joined in German. I need to translate these transcripts and send back the transcript in German. It’s running them through another set of models call translation models. We use neural machine translation, kind of the vogue machine translation right now. That will then map the text output, the transcript output generated by the acoustic models to text that is in a different language. Now, the machine transition service is a service that is trained on bilingual texts. Here we have millions of sentences of bilingual content per language. That then generates fairly robust output in that target language. Those think it sent back to the local devices.
On the local devices, if someone is hearing, they can also listen in their language. You can turn on a switch that says produce audio, so it will take the transcripts and generate audio from those trips as well.
WADE WINGLER: The first level of translation happens from the spoken word into English. Is that the baseline? And then from English into other languages? I guess not. If they are speaking in German, it has to go both ways, right?
WILL LEWIS: Whatever language they happen to be speaking. We currently support 10 languages for full speech. We have over 60 for translation. The way that works is if I am speaking in any of these 10 languages, it will transcribe those 10 languages, which includes all of the stuff I thought about earlier where it does the dysfluency processing, puts sentence boundaries, and all that happens in each of the languages. That acoustic signal generates the text, and that text then gets generated — if that’s desired. It could be that we are just having monolingual conversations. In that case, it is sending the transcript back to the individual devices.
WADE WINGLER: What does accuracy look like?
WILL LEWIS: That’s a good question. It varies by language and speaker, by domain, all of this. For me, standard conversation, if I am using our baseline models and having a conversation with someone, the work error rate for me is probably around 12 percent. What does that mean? This is a typical way of measuring the quality of speech recognition. Work error rate, the lower the number, the better the quality. For me — and I have a pretty good speaking voice — it’s around 10 to 12 percent, somewhere in that range. We can drop that if we adapt to the domain. One of the things we do in the University settings and we can adapt the models to the technical vocabulary of a class. So like if it is a biology class or chemistry class or history class, we can adapt to whatever technical terms are being used in that class. That reduces the error rate even further, maybe by a point or two. That point or two is actually crucial because that is the technical vocabulary that matters in that particular setting.
To give you a comparison, the best comparison would be CART. This is a service that is provided in this country for doing transcription of audio. It’s a human doing the transcription. The very best CART captionists have a work error rate of around 4 to 6 percent. That’s the very best. That someone who knows the domain, who has been doing this for years, is very strong at doing the captioning. Most people are actually higher than that number. Definitely we are not as good as humans. Humans are better at this than machines are. But we are remarkably close. Plus you have the fact that you can do this automated translation on the fly as well.
WADE WINGLER: Does it get better over time based on the individual or the big collective world of users?
WILL LEWIS: It is getting better. From usage, it learns. It is using the neural networks. We provide data to these neural networks. They learn over time. More usage improve the quality of the transcriptions and translations. When they can do want to point out — and this is happening. It kind of started us a little bit. It was something we weren’t expecting. We were working on it but hadn’t expected it to happen automatically. That’s deaf voice recognition. Ted, for instance, is profoundly deaf. It has a pretty strong deaf voice. A year ago, if he try to use our apps and talk to the app, it didn’t work very well. It came out with really bad transcripts. Now over time, just because we are learning from people who are deaf who are using it, it does remarkably well on his voice. I haven’t measured it, but it looks like it’s probably around 20 percent word error rate, which is remarkable. It’s usable at that point. He really loves it because now it recognizes his voice, he can talk to someone that speaks French. He can speak in English, they can speak in French, and they can have a conversation, which you contend than a year ago.
WADE WINGLER: It’s amazing. We are getting close on time for the interview here, so a couple of practical questions. Cost, platform, availability? If folks want to try this, what do they do?
WILL LEWIS: It’s currently free if you down of the apps. It’s Microsoft translator app for Android or iPhone. You go to the respective stores for your device and download it. It runs on iPads see you can download the app for the iPad. There is also a version for Windows devices, so you can download from the window store for that. It also runs in a browser. You can run this within a browser. Just open your browser and go to translate.it, and this will open it up.
WADE WINGLER: Really quickly before we close out, what’s in your crystal ball when it comes to Microsoft translator? What do you want it to be a few years down the road?
WILL LEWIS: I’d like it to break down as many burials as possible. I look at the story of Ted in particular. Teddy uses this everywhere. He takes the app with them wherever he goes. Sure, if people can sign, they can sign with him. But if you go to a bookstore and is at a counter and wants to talk to someone, they can understand him, but he can’t understand them. He has them down of the app oftentimes on the fly and has conversations with people that he would never have had conversations with before. I’d like to see that break down barriers everywhere, language and hearing barriers wherever we can.
WADE WINGLER: Give us at the website one more time for people who want to try it or find it will and more.
WILL LEWIS: The best website to go to would be translate.it. “Translate it.”
WADE WINGLER: Will Lewis is the Principal Technical Program Manager for Microsoft translator and has been a great interview today. Thank you so much for coming in our show.
WILL LEWIS: Thank you so much. It was great being here.
WADE WINGLER: Do you have a question about assistive technology? Do you have a suggestion for someone we should interview on Assistive Technology Update? Call our listener line at 317-721-7124, shoot us a note on Twitter @INDATAProject, or check us out on Facebook. Looking for a transcript or show notes from today’s show? Head on over to www.EasterSealstech.com. Assistive Technology Update is a proud member of the Accessibility Channel. Find other shows like this, plus much more, at AccessibilityChannel.com. The opinions expressed by our guests are their own and may or may not reflect those of the INDATA Project, Easter Seals Crossroads, or any of our supporting partners. That was your Assistance Technology Update. I’m Wade Wingler with the INDATA Project at Easter Seals Crossroads in Indiana.
***Transcript provided by TJ Cortopassi. For requests and inquiries, contact firstname.lastname@example.org***