ATU365 – Microsoft’s Seeing AI in depth


ATU logo

Your weekly dose of information that keeps you up to date on the latest developments in the field of technology designed to assist people with disabilities and special needs.

365-05-25-18 – Microsoft’s Seeing AI – In depth | Anirudh Koul & Saqib Shaikh |

If you have an AT question, leave us a voice mail at: 317-721-7124 or email
Check out our web site:
Follow us on Twitter: @INDATAproject
Like us on Facebook:

——-transcript follows ——

SAQIB SHAIKH:  Hi, I’m Saqib Shaikh, Senior software engineer at Microsoft.

ANIRUDH KOUL:  This is Anirudh Koul, and I am the senior data scientist for Microsoft, and this is your Assistive Technology Update.

WADE WINGLER:  Hi, this is Wade Wingler with the INDATA Project at Easter Seals Crossroads in Indiana with your Assistive Technology Update, a weekly dose of information that keeps you up-to-date on the latest developments in the field of technology designed to assist people with disabilities and special needs. Welcome to episode number 365 of Assistive Technology Update.  It’s scheduled to be released on May 25, 2018.

Today we have a fun show an exciting interview.  I’m spending some time talking with Anirudh Koul and Saqib Shaikh, who are two of the developers, I really the inventors of seeing AI, which is an app that has sort of taken the assistive technology world by storm.  It allows you to have a free app on your Apple device that can read a text, handwriting, objects, faces, all kinds of amazing things.  Designed for folks who are blind or visually impaired but really applying to a lot of other situations.  We have an extended interview today with those developers.  They are delightful gentleman, and we kind of get into the behind-the-scenes about how was it developed and what’s the origin story of the app and some of the myths about what it is and isn’t and some of the tips and tricks to make sure that it works well with you. We are super excited to have these guys on the show today.

We hope you check out our website at, sent us a note on Twitter at INDATA Project, or call our listener line.  We love to hear your feedback, questions, comments.  You might even end up on this or one of our other shows.  That number is 317-721-7124.



[1:55] Interview



WADE WINGLER: So back in episode number 352 of Assistive Technology Update — was released in February 2018 — our vision team lead here, Belva Smith, spent some time with us talking about an app that everybody here was super excited about called the seeing AI from Microsoft.  In the interview, she spent some time talking with us about the future the way it does and those kinds of things.  And not long after that, I was doing a tweet chat with an assistive technology University class I teach, and I had a bunch of students who were also tweeting at talking about seeing AI and how useful it is for individuals who are blind or visually impaired.  You might imagine my surprise when one of the founders and developers of the seeing AI app who is Anirudh Koul, was tweeting one right there with us and saying, hey guys, glad you like our app and those kinds of things.  I was so excited when we were able to reach out to Microsoft and Anirudh Koul and Saqib Shaikh decided that they would come on the show and talk with us a little bit about seeing AI to expand and what we talked about with Belva and also give us some additional insight to what’s going on with this exciting project from Microsoft.  Enough of my rambling, Anirudh, Saqib, thank you so much and welcome to the show.

SAQIB SHAIKH:  Thanks for having us.  It’s a pleasure to be here.

WADE WINGLER:  We appreciate you guys taking some time out of your busy days.  I would love to hear — I kind of want to start a little bit with what is seeing AI and what it does.  Not everybody is heard the previous episode, not everybody has had a chance to try it yet.  Can one of you tell me just a little bit about what it is?

SAQIB SHAIKH:  It’s a talking camera app for people who are blind or have low vision.  You just hold of your camera, and it tells you what’s in front of you.  It has different channels so that you can select what you want to hear about.  So you might want to read the writing around you will recognize people or find out what products are.

WADE WINGLER:  So it does a whole lot of things, and that’s pretty excellent.  Talk to me about who it is for.  What kinds of individuals? Who is it for?

ANIRUDH KOUL:  We designed the product primarily for the blind and low vision community.  That is our primary target we designed it for.  Interestingly, after release, we found a big used by a much larger group of individuals with disabilities.  People with learning disabilities are using it.  People in Asia are using it to learn English by taking a picture of something and hearing what that is in English.  Many people are using it for reading.  In fact, the app is a hit in the system administrator community because when they are in those server rooms, there are hard to get places behind servers.  They just put the phone behind instead of their head and hear the serial number.  It’s amazing.

WADE WINGLER:  I do some of that work in our IT program, and I hadn’t thought about using it for that.  I definitely will be doing that.  That’s excellent.  A quick practical question before we talk about the history and genesis of this.  What does it cost, and where can you get it, and what platform does seeing AI work on?

SAQIB SHAIKH:  It’s absolutely free and available on the iPhone app store.

ANIRUDH KOUL:  In fact, we just launched it today and 21 more countries, bring total availability to almost 56 countries around the world.  Globally, there is a good chance it is available in a country you are in.

WADE WINGLER:  We almost always record the show before it goes out and are actually recording on Global Accessibility Awareness Day.  This launch is part of your initiative in the area, right?


WADE WINGLER:  I want to hear the story about how this came to be.  I happen to know that it is an interesting story, but what is the history of seeing AI?

ANIRUDH KOUL:  It’s a really exciting one.  Every year, we have something called a hack-a-thon.  It’s about a week long time that the CEO gives to all Microsoft employees around the world to basically go and chase our dreams for an entire week.  Many of us had an inclination to make something and seeing the limits of what AI could be used for accessibility.  So we founded a team of 16 people around the world in engineering, design, accessibility, and research and started building some prototypes.  One of our first prototypes was a cell phone that we had duct taped to the head that we could talk to.  It would give you answers based on what the camera was looking at.  Since then, we kept working in our passion time, evening, weekends, to see how we could bring this into real life.  Along the way, we were joined by many more supportive people in anyways around the company.  One final day, we got our funding to basically go from a hacker team to a real team to bring this product out.  We experimented on many different ways of getting this technology out.  We even had a prototype pair of smart glasses that you could use to take a photograph and here it.  But in the real question came, which is how do we get this technology into the hands of as many people as possible? The answer was a free app.  That should do it, so that’s what we did.  Today you know that as seeing AI, the talking camera app.

WADE WINGLER:  That’s remarkable.  I have a million questions.  I’m really fascinated with the idea of what’s happening behind the scenes with the technology.  It’s called seeing AI because AI represents artificial intelligence, right?


WADE WINGLER:  So what’s going on? How does the AI work? How does it learning? How do you treat it? Talk to me about that stuff.  I don’t even have a good question for that.  I just know that I want to know about how it works.

ANIRUDH KOUL:  AI is this amazing thing that has picked up a lot of heat in the last few years.  The key idea is if you show something in of data, you can design it to learn patterns and do things at a level which we were not taught to be possible just five or six years ago.  As you start looking at many things in the app, they use computer vision, but they also start to predict things.  The quick example is that when you take a photograph, it tries to tell you your age, gender, and emotion.  Another example is that when you take a photograph, it tries to describe a general seem to you.

SAQIB SHAIKH:  This is possible because we are showing the AI hundreds of thousands of images and teaching it that this is this object or this person or this emotion.  You are really teaching the AI based on the photos, and then that’s how it’s learning to predict those things in the future.  Right, Anirudh?

ANIRUDH KOUL:  Exactly.  To give you a fun example, you see those emotions that the app tells you? The app has seen over 10,000 Hollywood scenes when the actor like Leonardo DiCaprio uses that you motion, it has learned it by watching these Hollywood scenes.  Similarly for the scene understanding mode, it has seen over 300,000 images that we collected and taught the system.  Now it is able to construct sentences, so going beyond just saying the object, but how would you see it as a free-flowing sentence.  With the great folks at Microsoft research, which we are a part of, we have been working together on how we can keep improving the AI year-over-year.  In many cases, this is like a three year old.  He knows much about the world but is still learning, but maybe over a couple of years ago is from three-year-old to five-year-old and start become even better than what you see today.

WADE WINGLER:  I’m glad you mentioned the metaphor of a child, because as I hear you describing how AI works, that’s kind of how people learn stuff.  It’s constant exposure and reinforcement sparked with curiosity and intellect and mental horsepower.  You guys are providing the intellect and mental horsepower.  Where does the curiosity come from?

ANIRUDH KOUL:  As long as the users tell us what is the real need to solve, we try to design the AI and overtime teach that enough to start making some sense.

SAQIB SHAIKH:  The users provide the curiosity, and we make sure that we teach the things the users are curious about.


WADE WINGLER:  That’s great.  We talked about users a lot.  Frankly, on our show here, people listen and come on to the show to talk about what they’re doing, usually not because of the technology but more about the impact of the technology.  Generally, people who are interested in assistive technology and accessibility kind of interpersonal reason behind that.  They are interested in how they we are impacting people’s lives and how we are increasing independence.  I bet you guys have a story or two about users and how they might be using this technology?

SAQIB SHAIKH:  We have so many from the deeply touching to the right hard to believe.  Maybe starting with the touching.  Last Christmas, we launched the handwriting feature and got a flood of emails from people who were excited to read greeting’s cards, Christmas cards, for the first time in a long time.  Or able to read their children’s homework.  Were able to read letters that they received years ago before they lost their sight.

On the other side of the exciting or creative uses, we had a teacher who mounts the phone in his classroom so he can tell which students are walking to the classroom and of course which students haven’t yet turned up.

ANIRUDH KOUL:  So you can sneak in anymore.

SAQIB SHAIKH:  Right.  Or one funny story is we have this short text that, when you hold the phone up, it reads whatever text you can see.  One very creative user put his phone in front of the TV on a tripod and watched a foreign language movie in French.  So as the subtitles in English come up, seeing AI reads them straight away.  That’s how he enjoyed the movie.

WADE WINGLER:  He had it narrating for him.  That’s great.

ANIRUDH KOUL:  Interestingly, by hearing his story, other users start following and replicating this thing.  Just to add a few more that motivate all of us, we didn’t have a currency channel for a long time to recognize currency, and that was a user need.  In the time that we built it, what users did is we have a face recognition mode the app where you take three photographs and you can teach the name of something.  What they did is they took the US dollar note.  There is a present space in the middle, and they talk the app what that president looks like and said a brown Lincoln is five dollars. That was a creative way of solving a problem.

Some other really interesting stories that we found, was when we launched, we heard there was a salesman who changes his sales pitch based on the customer in front, because he is hearing the audio of the after a discrete headset, and basically taking periodic photographs to understand to make small talk as well is to see if the customer is still interested by facial expressions.

The hard to believe side story that Saqib was talking about, a couple of them include a professor and Puerto Rico who was there during the hurricane.  Because of the high winds, tons of things had moved.  He used the app to avoid falling trees and power lines.  That was an incredible story to be here.

WADE WINGLER:  That’s really touching.  I had no idea that it was being used for so many different things.  That all helps the learning process.  You guys mentioned that you are a user centered team and doing user centered development.  I assume that you take feedback from your users, suggestions, or maybe frustrations?

SAQIB SHAIKH:  Absolutely.  We have a feedback option inside the app which every day we get a flood of the most through.  People email us directly at It’s always great to hear from a user and ways that the app could be improved and also these fun stories.

WADE WINGLER:  I’ll pop that email address in the show notes that people can reach out to you.

ANIRUDH KOUL:  When we launched the app, we had a silent launch.  We didn’t know how many people would be interested in it or find it useful.  Now we basically have a daily duty to go through between 80 to 100 the most everyday.  Every piece of them of that we get, we try to answer it so that someone speaks of a future, we try to put it in a priority queue, and that helps our development go to what is the most talk about feature that people are requesting, would be the first thing that we would build.

WADE WINGLER:  I love it when user feedback fees the design process.  Tell me a little bit more about the design process.  How does that work?

ANIRUDH KOUL:  When we were designing the app, we had — being engineers and researchers, we tried to protect technology.  We find the technology and say, wait, this would be helpful.  But what we learned is you need to put the user before the technology.  You don’t have to create a cool demo.  You have to think of the user, try to reduce the friction in using the system for them, and try to get them to achieve a task in as little number of second as possible.  That was our main aim.

SAQIB SHAIKH:  Just to add what you were saying, it’s been this interesting blending of finding new AI and new experiences that can help the users in the challenges that they tell us they have.  There is some new type of thing that someone wants to recognize, or they are having difficulty with a certain aspect of the app.  Then we can look at our research lab to see how can AI help solve this problem.  But than the other side of this is how can we take that technology and create this interesting user experience that we have within the app.  So that we have people who do the user studies and use testing with to fine tune that.

ANIRUDH KOUL:  As one of the simplest example of AI that we can give is the barcode reading example of the app.  Users asked us to recognize products, but recognizing products on a cell phone is often a frustrating experience because if you don’t know where the barcode is to begin with, the barcode readers need the barcode to be right in front.  So on day one, we thought this was a simple problem.  You put in a barcode reading library and you are done with it.  Then while users were testing, we had a slap on our face.  To fix that, we started to train the AI by showing it thousands of barcodes at different angles, different lighting, different orientation.  Then to indicate that to the user, we built a guidance system that gives you beeps.  Now by combining these two parts of AI, plus sound, to make the user interaction easier, we are now finding that the frustration on using a cell phone-based barcode reading experience as much, much lower.  Many people who used to have hardware barcode scanners, they find that this is a much easier and obviously free alternative than buying some other devices which previously cost a lot of money, thousands of dollars.  That is one example of where we create a prototype, ship it to users, hear are the bad things that they had to say and how that experience is, then work through the week and come back in this new uses next week.  Basically by going from two users every week, prototyping and fixing, two users a week, prototyping and fixing, over a few months we started to get to where maybe we have something that is useful.

WADE WINGLER:  That’s a fascinating process.  We’ve been talking a lot about taking pictures of barcodes and faces and people and those kinds of things.  I am our agency’s security officer here when it comes to HIPAA. All of a sudden, I’m thinking that a lot of pictures.  Talk to me a little bit about privacy.  If you are using this app in your world to take pictures of things in your home — and I know you need to take privacy seriously.  How do you handle that?

ANIRUDH KOUL:  You just said it.  Microsoft takes a privacy seriously.  All of the images, for privacy, we try to get as much of the processing done on the device as possible, so those images are never sent to our servers.  Our servers are primarily for giving the functionality.  We do not store any images.  We do not store these results.  All we do is keep these server-based APIs primarily to get the users the answer.  Imagine that we scan a bank document.  It’s secure, safe, because it is deciphered and related from an image to the results.  We don’t have it on our servers anymore after you have gone the results.  Your data is safe with you, not with us.

I should add one point that, for training of this AI, we collect our own data.  We go at links to collect a diverse set of data to train it so that we don’t have to use user data for this.

WADE WINGLER:  That makes a whole lot of sense.  I know that there are people in our audience who are exited about this interview, and it’s probably because they are using the app all the time for important things in their life.  I know that you have some tips and tricks and ways that people can be most successful using the app.  Give me some of those.

SAQIB SHAIKH:  As we talked to many different users, we’ve been building up these ideas as to what tricks people are using.  For example, sometimes it’s just a case of remembering that your phone’s camera is, in fact, in the top right corner of the phone.  So whatever you are pointing it at something, you want to keep that in the center.  Then you should be careful about how far away you keep your phone from an object.  When recognizing currency, we tend to find six inches is pretty good.  Or when you are recognizing a barcode, holding the phone a few inches away and then moving the object around is a nice technique.  Sometimes you can always move it a bit away because that’s going to enable the camera to see even more.

One lasting which was a tip that a user came up with that I had never thought of.  It was the first week of launch.  When you are recognizing a white document, putting it on a white table me that it might not find the edges so well.  There was a lady who carried around a black scarf in her purse.  Whenever she wanted to read a document, she would lay out the black scarf, put the document on it, and then scan the document.  I’m not saying everyone should do that, but again, a creative idea.

ANIRUDH KOUL:  To add to that, one of the key things that makes people more successful using the app is enough lighting.  Using the app in an environment where there is enough light is generally a recipe for success.  The app obviously has built-in lighting if it is very dark.  It has guidance inside the app to guide you in many of these modes.  That guidance can also help in positioning the phone to help you be more successful.  Especially when people try things for the first time like the barcode reader, we tell them to try it on for five different objects to get the hang of the guidance mechanism and to get used to it over time.  People start to get faster and faster.

The last trick that is most successful is we have a sharp textmode.  Show it a piece of text, and it will start reading it really fast.  Often when people point two things, in the first instance they don’t know if the entire piece of text is visible, but by moving the phone around a little bit, you will probably get the entire piece of text interview.  Often when people show it a piece of text, they keep pointing the phone towards its, and the phone was a little bit.  Because of that, it starts to interrupt.  The tip is when you start hearing some text that you like, maybe point the phone down on the ground quickly so that it doesn’t see anymore text and the phone will not interrupt.  It will keep speaking the text.  This is especially useful if you have a long document.  Point at the piece of text, and then point the phone towards the ground quickly, or put your finger on the camera, and it will not interrupt.

WADE WINGLER:  Those are some great pieces of advice.  We are about out of time for the interview today.  Before we go, one more time, where can people get seeing AI? And if they have feedback or want to learn more reach out to you, where should they go to do those things?

SAQIB SHAIKH:  You can search for seeing AI in the Apple App Store or go to  To get in touch with us, you can able

WADE WINGLER:  Anirudh Koul and Saqib Shaikh are on the seeing AI team for Microsoft and have been delightful and insightful guests today.  Gentlemen, thank you so much for joining us.

SAQIB SHAIKH:  Thank you.

ANIRUDH KOUL:  It was a pleasure, thank you.


WADE WINGLER:  Do you have a question about assistive technology? Do you have a suggestion for someone we should interview on Assistive Technology Update? Call our listener line at 317-721-7124, shoot us a note on Twitter @INDATAProject, or check us out on Facebook. Looking for a transcript or show notes from today’s show? Head on over to Assistive Technology Update is a proud member of the Accessibility Channel. Find other shows like this, plus much more, at The opinions expressed by our guests are their own and may or may not reflect those of the INDATA Project, Easter Seals Crossroads, or any of our supporting partners.  That was your Assistance Technology Update. I’m Wade Wingler with the INDATA Project at Easter Seals Crossroads in Indiana.

***Transcript provided by TJ Cortopassi.  For requests and inquiries, contact***