ATU242 – Blind Tool Object Recognition, Artificial Intelligence to Determine What’s Funny, Exercise Buddy App


ATU logo

Your weekly dose of information that keeps you up to date on the latest developments in the field of technology designed to assist people with disabilities and special needs.
Blind Tool – Joseph Paul Cohen | @josephpaulcohen
AI Algorithm Identifies Humorous Pictures | MIT Technology Review
App: Exercise Buddy

Listen 24/7 at
If you have an AT question, leave us a voice mail at: 317-721-7124 or email
Check out our web site:
Follow us on Twitter: @INDATAproject
Like us on Facebook:

——-transcript follows ——

JOSEPH COHEN: Hi, this is Joseph Paul Cohen. I’m the creator of blind tool, and this is your assistance technology update.

WADE WINGLER: Hi, this is Wade Wingler with the INDATA Project at Easter Seals crossroads in Indiana with your Assistive Technology Update, a weekly dose of information that keeps you up-to-date on the latest developments in the field of technology designed to assist people with disabilities and special needs.

Welcome to episode number 242 of assistive technology update. It’s scheduled to be released on generate 15 of 2016 par Park today I have a fascinating and sort of extended conversation with Joseph Paul Cohan who is a researcher and has developed a tool called blind tool, and android app that does interesting object recognition for folks were blind or visually impaired.

Also we had a funny story from MIT technology review about how they are developing artificial intelligence algorithms to identify humor and tell us what’s funny; I’ve got an app from BridgingApps; and we hope that you will check out our website at, shoot us a note on Twitter, or give us a call on our listener line. We love to have your feedback and questions. That number is 317-721-7124.

Like this show? Have assisted larger questions? Check out our new show, assistive technology frequently asked questions. Find more over at

My wife might not always agree, but I think I’m a pretty funny guy. In fact, it may be easier for her to argue with me in the very near future because there is a group of researchers over at Virginia Tech led by Arjun Chandra. They are trying to develop algorithms and artificial intelligence that can tell us what’s funny. Basically the way they did this is they took a bunch of cartoon figures and then they asked workers from Amazon mechanical Turk to position them in funny and unfunny positions and basically developed these storylines that were either funny or unfunny. Then they had other mechanical Turk workers review them and say this was funny or that wasn’t funny. They are building algorithmic models based on this data that they’ve collected. I guess the figures that they used were clipart things like animals and indoor and outdoor objects like doors and windows and those kinds of things. They basically had people figure out what was funny and pop it into their database and start working the data on it. Apparently this thing works pretty well. It’s a limited data set obviously but they are starting to figure out what constitutes money and what doesn’t. Obviously that has all kinds of interesting implications for the future. I would assume that entertainment companies could run some of these algorithms across stuff to say what’s going to play well with audiences and what might do well and maybe even computers telling jokes in the future. I’m going to link over to this article called AI algorithm identifies humorous pictures. It’s in the MIT technology review. You will find that link in our show notes.

Each week, one of our partners tells us what’s happening in the ever-changing world of apps, so here’s an app worth mentioning.

AMY BARRY: This is Amy Barry with BridgingApps, and this is an app worth mentioning. Today’s app is called exercise buddy, visual experience system. Exercise buddy is an app designed to help individuals with autism participate in an exercise program and learn about their body. Although the app is designed with autistic children in mind, a child with any disability can greatly benefit of this program. It really is an outstanding resource for inspiring young people to exercise. Exercise has videos with children and young adults with autism performing many of the exercises. Exercise buddy addresses five components of physical fitness: body image, posture, motor coordination, muscular fitness, and cardiovascular fitness. The app presents with picture cards and videos of over 130 exercises. Coach Dave, and autism was fitness specialist, and the designer of the program, provides suggested workouts or an individual workout can be set for the user. The workouts are set up with a first — kitchen structure, or start to finish model. The workouts can be saved so that the user can improve with repetition. In addition to exercises and workouts, exercise buddy provides the instructor with teaching tools which include flashcards, tests, additional activities, coloring activities, lesson plans, and much more. There is also a body systems tool that allows the child to learn to identify the parts of the body including the muscular system, skeleton system, digestive system, and the brain, over 50 lesson plans and worksheets are available within this app. We used exercise buddy with elementary aged children with autism and developmental delays. The children enjoyed watching the videos of the other children doing the exercises and then matching the movements themselves. We were able to see how long to continue the exercises or stretches with a clear visual support that were provided within each activity. A voice encouraged him to continue and let them know when to start and stop. Not only did this motivate the users, it also allowed the instructor to have a less hands-on approach with the particular child. The children that used this app were really excited to be able to do the exercises and work through an exercise program like to have seen done by other children. This app is highly recommended for either parents or instructors of children with disabilities. Exercise buddy costs $29.29 at the iTunes Store and works on iPads. For more information on this app and others like it, visit

WADE WINGLER: Before we jump into the interview, I have a quick editorial note. When I recorded this interview, we were having some benefit troubles or something so every once in a while you will hear that they miss Internet sound where the audio quality isn’t perfect. I apologize in advance. Please bear with me. The content is great. Bear with those technical difficulties as we listen to Joseph Paul call and talk about what they’re doing with some neural net logic in creating an app called blind tool.

So there are apps for almost everything these days. You guys are the clichés: there’s an app for that. For people who are blind or visually impaired, there have been a number of impressive apps that have emerged in the last two years and really have been groundbreaking and life-changing for people who are blind and visually impaired. Recently I have seen a thing coming across all my feeds and news feeds and all that kind of stuff that about a thing called blind tool and a gentleman named Joseph Paul Cohen who is a PhD candidates creating something that I think is pretty darn cool. I wanted to ask him to come on our show today and talk with us a little bit. First of all, Joseph, are you there?


WADE WINGLER: Excellent. You have been very busy these days. I’ve seen you in fast Company. See you on some pretty mainstream news platforms recently talking about this product you are working on. Before we talk about blind tool, talk to me a little bit about you and your background and then we are going to to jump into this cool thing you’re doing called blind tool.

JOSEPH COHEN: I’m a PhD candidate in computer science. I spent a lot of time doing object recognition test and imagery as well as studying how citations networks are working with computer venues and also reinforcing learning stuff and all sorts of stuff in the mission of the area of computer science.

WADE WINGLER: How did you become interested in object technician for people who are blind or visually impaired? Was there a personal connection? What got you interested in this particular topic?

JOSEPH COHEN: Years ago I was working with a blind programmer. We became friends and hung out and talk about a bunch of stuff. I became aware of all the different systems. Around that time I even tried – I turned the screen off of my laptop and turned on the assistive technology and try to use it without seeing the screen. It was really hard for me. Other things I learned about, like how difficult math is to do, to read and write for someone who is blind. I talked to my friend. I use type setting language called What’s Up that allows you to read and write math but you don’t actually have to drop it in its script form. You write it out as its code and then you can read and write the code. It makes math completely accessible to someone who is blind because the screen reader can read the math that appears in some research papers. It’s open my eyes to technology that can enable blind or visually impaired people to have interactions with all stuff that we are doing. Back then, I thought of — I was like, I wish there was a stick that somehow would tell you all the things that were in front of you, maybe if there were some sticks that would act like sonar. But I envisioned it back to something that would tell you everything, it would be just like a person that would tell you all the things in front of you and give you the vision, like a supercharged scene stick. This is a step towards that.

WADE WINGLER: The research I’ve done and getting ready for the interview, I found it to be fascinating what it does. Obviously object recognition for people who are blind or visually impaired has been a challenge from the beginning. There’s been a lot of tools that try to do that and do that well. I would say there’s been various levels of success over my career. Tell me a little bit about blind tool and specifically what it’s doing and maybe how it’s different from some of those other approaches that have been tried in the past.

JOSEPH COHEN: In the core of it, if the technology called a convolutional neural network which was invented like 20 years ago to read the ZIP Code off of letters for the United States Postal Service. They needed to use this technology to recognize handwritten characters, the digits that we write on envelopes, the ZIP Code. That technology, the concepts are pretty old, of being able to make it faster, learn imagery that has high resolution, able to run on its own, all these things are new. In the core of it, it’s operating on images just like Photoshop filters operate on images. When I Photoshop images, is going to transform the image into something else. What this does – it has nothing to do with Photoshop – it will continue to process the image by transfer it into different types of images. It keeps doing that until it gets to a prediction. It takes an image and keeps running these transformations on it all the way till it gets to the end where it’s actually saying what that object is. It’s pretty cool. This is all designed to mimic the eye. People get to design this from looking at how the eye operates and look at how it’s processing the information that’s coming from our eyes all the way to go into our brain, how do we perform the object recognition test and then researchers for years have been trying to figure out how you encode a program to do that same thing.

WADE WINGLER: As I think about this, help me clarify. We’re talking about an app that doesn’t let necessarily rely on an object that it has recognized before. Not one of those things we take a picture or several pictures of an item and then use the app to go back and we recognize that same item. It’s actually taking pictures and then narrowing them down into categories and further narrowing them down into particular kinds of objects. Do I have that the right way?

JOSEPH COHEN: Is not as clean as you are saying. There’s a lot of entanglement inside the network as far as how the concepts are represented. It’s trained on just seeing images and it runs these images through the network of these observations, and get to the end and looks at how good a position it was. If it’s not a correct prediction, then updates the working network so next time it will inform a little bit better. It does this in a minute that takes all these concepts and just merges them inside the network to the point where it’s hard for us to understand where the concepts are inside all these transformations. We are learning the stuff automatically which is the goal of machine learning. Inside, you don’t really know where the concept of a chair is versus the concept of a cat. It’s really hard to tease out where that actual knowledge is inside the network. We can still look at the results there. It deals with concepts. On the image, we can notice features that will represent parts of a chair. These features will only be visible on a chair. You can picture it as something that looks like the legs of a chair and the back of a chair, it would think that it’s a chair. It’s not trying to match everything. Is it really know that it’s a chair, but it looks and has features that represent what the chair looks like images that we use to train it. If you somehow can look at the network and say what you think a chair looks like, it would look like a chair to us at all. It would look like something that kind of looks like part of a chair. There have been some studies on stuff that looks like candles. With a candle, the network sees the tongue of flames everywhere. It doesn’t actually see it as part of the candle. They don’t think of it as stem or a single candle. They just see anything that has a flame on it and call it a candle. Its idea of a candle is these tongues of flame all over the field of view. What it is recognizing is not what we think of.

WADE WINGLER: It sort of makes sense. It sounds more like you said, like how the brain works, health neural networks work, how eyes work and how you learn about those larger allegories of things so that if I’m looking at a photograph of a chair or a drawing of a chair Those common elements are going to be there. That’s starting to make sense for me a little bit. I saw a video of the app being demonstrated and it was recognizing a cup of coffee and a banana and those kinds of things. Is that what the user experience is like at this point?

JOSEPH COHEN: It depends. If you have a clean table, like where it was in the demo video, it works like that. It does work that well in that setting [Inaudible]. In noisy environments, it doesn’t perform as well. Those labels were specifically in the data set, so it knows about them. They never can actually predict those. Many other things will not be predicted. This has led to our frustration where it won’t recognize things that people want it to, or they are using it in a certain setting where it is surrounded by a bunch of other objects so it’s confusing the network. I believe in a kitchen or office or on a desk, it’s best in these noise free environments.

WADE WINGLER: That makes sense. I know that auditory, it’s hard to sort out noise and visuals is the same kind of thing. It sounds like we are at the beginning of this process and it sounds like there’s a lot of promise for how this ecology might grow and change and do a better job of that. Tell me, what is it doing now in terms of platform and availability and cost? It’s an app. You can get this in its current state, right?

JOSEPH COHEN: It’s a free app. There are specific differences between this app and other apps that make it possible for this one to be free. There’s one called Be My Eyes. It’s similar in its goal to work for the user, but Be My Eyes is [inaudible] for one reason: if not using a computer to detect the images. It’s using people, which makes it so that you are talking to someone and they are telling you what they are saying which is significant. It’s different and way more powerful. The matter how good this network can be, a human will always be better. With this app, it’s actually all processing on the phone. All of the prediction is occurring on the device so there is no need for a server somewhere that is going to cost money to run continuously. Once someone downloads this app I don’t enter any more cost. They can use it forever. You can leave it on and it will be constantly predicting things and doesn’t actually [Inaudible]. That makes it able to be free.

There is another app called Tap Tap See. You hold it over something and you tap the screen. [Inaudible] went to a server somewhere and then there was a prediction that happened and it came back with the result. There is a Google Goggles app that does something similar, but when it takes the picture, it throws it to a server where it gets a prediction of what it is and sends the results back. It tells the user what it sees there. Whenever you have a server somewhere, there is a cost that makes it so it’s hard to be free. Currently this is free. I plan for it to always be free. Even if I do upgrade in the future, this part of it, the tool to aid blind people will always be free. I have some other ideas where I want to reach out to businesses and have them use this technology to solve their problems. That will be separate to this device.

WADE WINGLER: That’s good. I think that is altruistic. As I’m sitting here thinking, I can envision other commercial applications that probably would have a revenue line associated with them that would be great. I know you have a Kickstarter campaign out there. Tell me a little bit about your current Kickstarter campaign and what is on the horizon for blind tool and this product you are working on.

JOSEPH COHEN: The biggest issues with blind tool right now, because there has been a lot of user feedback, as people don’t like how accurately it predicts. It’s not as amazing for everyday use. You walk around your room, you might want to know that there is a window there but it’s not telling you it’s a window because we didn’t train on windows. You really want it to tell you the window is over there so you can open it. Specifically the Kickstarter talked about a jar of Nutella. I want to open a jar, but the network as it is doesn’t know anything about the shape of a Nutella jar. I want to put in labels that are relevant to people who want to use this app. I use random labels from [Inaudible] data set. Those labels have nothing to do with the day-to-day use of blind users. I want to get a collection of labels that people care about. To do that, I can just have people any pictures or getting pictures from the Internet and have labeled like a specific description of what it is. If I have all of that labeled imagery, I can train on that to be able to generate a new network which can go into version 2 that will have labels that are more specific to the day-to-day use is that people want. The concept that can be learned by this don’t have to be objects. It can be walls, shelves. A picture of almost anything with features can be learned. I want to get all of the concepts that people want blind tool [Inaudible]. The imagery that is closest to what people are using will make the most accurate prediction. That’s the goal for the next version, set servers, monetize the apps.

I want to make it a daily driver. I envision that you could use this throughout the day and it would really help you the vision problem. It can be your vision around your household or wherever you’re going that day. Whatever you need this to do, I want it to be able to do that for recognizing objects.

There is an idea of reading text. The next version I don’t plan on doing text. It’s a very hard problem to be able to combine both views together and have it in real-time. The horizon I seek for this technology is pretty great. If I don’t get to it, someone else will get to it. There are so many people working on this. A lot of people are way more ahead than me.

Having full scene recognition is the close future. All blind tool is now is take a specific image and give a prediction. It only things there is one object or one thing in the image. What a scene recognition solution could be is look at some image and tell you everything that is in the image, which is may be more relevant to what people want the technology to do. You take a picture of a room and it was a there is a chair to the right, a table directly in front of you. These are things people want this technology to do. I think that is on the horizon very soon.

WADE WINGLER: It sounds like there is infinite possibilities there. I’m excited about what it might do. Joseph, we are out of time for the interview today, but before we jump off the line here, will you tell people where they can reach out to you and learn more and keep up with the story, what’s happening with blind tools?

JOSEPH COHEN: My contact information is at The app can be downloaded from the play store for Android. It is only Android for now. I have a Twitter, which is @josephpaulcohen. Feel free to send me an email and I can get back to you.

WADE WINGLER: Joseph Paul Coleman is the creator of blind tool and has been our guest today, painting a picture about what this technology might do now and in the future for folks who are blind or visually impaired. Joseph, thank you so much for being with us today.

JOSEPH COHEN: Thank you for having me.

WADE WINGLER: Do you have a question about assistive technology? Do you have a suggestion for someone we should interview on Assistive Technology Update? Call our listener line at 317-721-7124. Looking for show notes from today’s show? Head on over to Shoot us a note on Twitter @INDATAProject, or check us out on Facebook. That was your Assistance Technology Update. I’m Wade Wingler with the INDATA Project at Easter Seals Crossroads in Indiana.