Session 1: Introduction to AI for Accessibility

Session overview

Whether you are high-tech friendly or whether you are less familiar with technology, this series of sessions should have something for everyone. In this first session, we focus on the basics of Artificial Intelligence or AI and how it can be useful for the blind and low vision community. We give you information and examples that you can read or listen to. You can also complete quizzes and questions to think more deeply about what you have learned. Hands-on exercises will help you use AI for accessibility. Let’s get started!

Introduction to AI 

AI is already helping people who are blind or low vision to make the world more accessible. Smartphone apps that use AI can help you enter text without typing, navigate your environment, and recognise currency, text and objects and tell you what they are. For example, apps like Seeing AI use AI to automatically recognise things in pictures you take with your phone camera.

Quiz: Can you spot the AI?

Take our quiz to spot which system uses AI

What is AI?

After these examples you might wonder what AI actually is. For this we need a clear definition.  One way of defining it is to think of it as doing something that humans can do, such as perceiving, learning, reasoning and solving problems. A different way – and one we will use – is to talk about AI in the way that it works.

According to OpenMIT lecture – What is AI , all AI systems contain three parts that work together: 

  1. a dataset 
  2. an algorithm 
  3. a prediction 

The algorithm of the system learns patterns from a set of data and it learns those patterns in order to make a new prediction when it sees a new piece of data. In other words, AI is trying to predict something in the future or something that the data says.

An example AI – Covid-19 infections 

Let’s discuss this definition using an example of an AI model that detects Covid-19 infections through cell phone-recorded coughs. Researchers from MIT have found that people who have the virus may differ from healthy individuals in the way that they cough. These differences are not decipherable to the human ear, but it turns out that they can be picked up by AI.

Covid-19 infections: How does it work?

Apply the definition to the example

Let’s now think how we can apply the definition of AI we presented above to this example. 

  • What will the dataset be? The dataset will be the sound of coughs collected both from healthy and infected individuals.
  • What about the learning algorithm?  The algorithm learns how the cough is different if it comes from a healthy or from an infected person. 
  • What does it try to predict? It predicts if the person that is coughing has Covid-19 or not.

How can AI help with accessibility?

For blind and low vision people, there are a lot of things that might be challenging at the moment, where AI could help. For example, how can you get from your home to a place you have never been before, without help? Or maybe you want to read a sign that someone has put up in a hallway? Or quickly send a text message without typing?

There are AI technologies that can improve accessibility. For example: 

  • computer vision might help people who are blind to recognise elements of their environment  
  • speech recognition and translation technologies might offer real-time captioning for people who are deaf or hard of hearing 
  • new robotic systems might augment the capabilities of people with limited mobility

There are many possibilities for helping people with specific needs with AI. However, this also raises many questions about ethics, for example, how to include people in developing this technology, who owns the data that people provide, how this technology might be biased against people with disabilities, and who gets the benefit of these technologies. Some of these ethical challenges will be discussed in the third session of this course.

In the following sections we will introduce three areas from AI for accessibility for blind and low vision people. 

  • Natural Language processing and speech recognition
  • Text recognition
  • Navigation and getting around

Natural Language processing and speech recognition

We previously discussed Alexa – Alexa is able to interpret what you say and also reply. In other words, it recognizes and analyses your speech to be able to give an answer to it. This is an area of AI that is called Natural Language Processing, or NLP. We will talk about how this works in more detail in session 6 but here we’ll just introduce it.

Dictation is an accessibility feature that most of you are familiar with – instead of typing a message to your friend, you just speak into your phone and it transcribes your spoken words to text. You might have used Siri, or Cortana, or Google Assistant before. This means it recognises your speech and then translates it into text. However, bear in mind that everyone speaks slightly differently or might have an accent. Speech recognition therefore uses AI to predict the text from the speech sounds.

Other systems, like Alexa, might have a conversation with you. That means that the system needs to learn how to recognize your speech but also what you say means so it can answer you correctly.  NLP and speech recognition are hot areas in many accessibility projects developed nowadays. They are useful for the blind and low vision community but really also for anyone who cannot use a keyboard.

Text recognition

The second example we are going to talk about is text recognition. Recognising text is useful for daily activities like reading a letter, identifying daily consumable medicine, distinguishing objects when shopping, finding out the right bus stop or reading the signage in a street. The most common tools that some of you might be familiar with is optical character recognition, or OCR systems.

These systems can scan printed text and then have it spoken out loud or saved to a computer file. Most of these systems only work well if the scan is of a document that has no or a simple background and well-organized characters in nice straight lines. It doesn’t work so well with a packaging box with multiple decorative patterns or a medicine with a lot of dense writing on it. There is ongoing work in OCR tools today using AI.

A lot of improvements have been made to use smartphone cameras in to extract significant text information from objects with complex backgrounds and multiple text patterns. Mobile applications like SeeingAI have a text recognition feature that is able to read documents and recognise the structure of the document such as headings and paragraphs allowing you to rapidly skip through the document using voiceover. Text recognition is an area of research in computer vision, a field of AI we will discuss in more detail in session 7.

Navigation and getting around

The last example we are going to discuss is navigation. Instead of using visual information about locations and obstacles, people who are blind or low vision might use white canes or guide dogs to help them get around. However, AI can help you explore your surroundings much more.

There is a lot of research nowadays that focuses on designing automated navigation systems using AI. But what exactly is this?  Imagine if you were able to carry with you a system that was able to see and give you instructions about your surroundings. This will be a complicated AI system that uses multiple algorithms from computer vision and natural language processing to be able to see and describe your environment but also give you navigation instructions. A system like this will be able to help you perceive and navigate to indoor or outdoor places.  

There are many navigation systems proposed for blind and low vision people but only a few can provide dynamic interactions and adaptability to different environments. The system we just described is called a visual imagery system that uses a camera to see and analyse your surroundings but there are also other systems for navigations such as non-visual systems that instead of a camera use different kind of sensors to detect if you are standing in front of an obstacle or not as well as map-based systems that mainly use tactile or auditory feedback to give your information about a map.

In Session 10, we will discuss in more detail about the many developments in the provision of navigation aid for the blind and low vision people, what kind of AI is used and how this can be improved in the future.

Question: Other uses of AI?

We now have covered different areas in which AI can help accessibility. Can you think of any other uses that would make your life easier? What applications do you think should be developed to help people who are blind or have low vision?

Click here to add your answer 

Smartphone AI for accessibility apps: An example

In the last part of this session, we will cover smartphone AI for accessibility apps.  Smartphones have been a revolution to bring accessibility to blind and low vision people. Some of these apps use AI and have significantly enhanced the lives of people who are blind or low vision. Let’s explore an app together!

What is TapTapSee?

TapTapSee is a free mobile app that can make visual information available for you. TapTapSee uses AI technology to automatically identify anything in a photo. All you have to do is to take a photo of something you wish to identify. The application then processes the photo and tells you what it is you have taken a picture of. While the application does not process the image fast enough for numbers on a bus, it is good for items around the house, as well as reading the print on signs. 

It works with anything, whether you have taken a picture of a shirt, a bottle, or a car. Are you looking for a can of soup in your cupboard? Which is the can of soup and which is the refried beans?  Or how about money? Are you holding a five-dollar bill or a twenty?  You can use TapTapSee to find out!

In order to be able to identify the object photographed, TapTapSee uses a field of AI called computer vision. With computer vision we can train computers to interpret and understand the visual world.  Using digital images from cameras and videos and learning algorithms as the ones we discussed above, machines can accurately identify and classify objects — and then react to what they “see.” Let’s now try to apply the definition of AI, we talked about before for TapTapSee.

TapTapSee and AI

  • What will the dataset be for TapTapSee?  Like almost all image recognition apps, TapTapSee is trained on an image database called ImageNet. ImageNet includes a collection of nearly 14 million images but the part that is used for training includes only 1.5 million images of 1000 object categories with their names.  
  • What about the learning algorithm?  The algorithm learns to tell apart different objects. 
  • And lastly what is the prediction? It tries to predict what is in the camera view so when it sees a new picture of a cat, it is able to recognise it.

TapTapSee: Demo

Let’s now try to download TapTapSee and use it. TapTapSee can be used both on an iPhone and Android.  Search for it on the Apple Store or Google Play

Let’s watch a brief demo of how to use it:

Try out TapTapSee

When you are set up, click here to try out TapTapSee

What’s in the next session?

In the next session we will explain in more detail how AI works. 

Additional resources

MIT open Learning: What is AI?

The 2020 Agenda – Sight Tech Global

Microsoft AI for Accessibility projects