Whether you are high-tech friendly or whether you are less familiar with technology, this series of sessions should be easy for everyone. In this first session, we focus on the basics of Artificial Intelligence or AI and how it can be useful for the blind and low vision community. We give you information and examples that you can read or listen to, and you can complete quizzes and write answers to think more deeply about what you know or found out. We also give you hands-on exercises to use AI for accessibility. Let’s get started!
Introduction to AI
AI is already helping people who are blind or low vision to make the world more accessible. Smartphone apps that use AI can help you enter text without typing, navigate your environment, and recognise currency, text and objects and tell you what they are. For example, apps like Seeing AI use AI to automatically recognise things in pictures you take with your phone camera.
2.2 What is AI?
After these examples you might wonder what AI actually is. For this we need a clear definition.
One way of defining it is to think of it as doing something that humans can do, such as perceiving, learning, reasoning and solving problems. A different way – and one we will use – is to talk about AI in the way that it works. All AI systems contain three parts that work together:
- a dataset
- an algorithm
- a prediction
The algorithm of the system learns patterns from a set of data and it learns those patterns in order to make a new prediction when it sees a new piece of data. In other words, AI is trying to predict something in the future or something that the data says.
Let’s discuss this definition using an example of an AI model that detects Covid-19 infections through cell phone-recorded coughs. Researchers from MIT have found that people who have the virus may differ from healthy individuals in the way that they cough. These differences are not decipherable to the human ear, but it turns out that they can be picked up by AI.
Click on the link below to listen how the system works in more detail:
Let’s now think how we can apply the definition of AI we presented above to this example.
What will the dataset be in this case? The dataset will be sound of coughs collected both from healthy and infected individuals.
What about the learning algorithm? The algorithm learns how the cough is different if it comes from a healthy or from an infected person.
What does it try to predict? It predicts if the person that is coughing has Covid-19 or not.
2.3 How can AI help with accessibility?
Until now we have only discussed general technologies that use AI. There are also AI technologies that can improve accessibility. These technologies can benefit blind and low vision communities around the world.
AI technologies offer the possibility of removing many accessibility barriers. For example:
- computer vision might help people who are blind to sense the surroundings
- speech recognition and translation technologies might offer real-time captioning for people who are hard of hearing
- new robotic systems might augment the capabilities of people with limited mobility
There are many possibilities for helping people with specific needs with AI. However, this also raises many questions about ethics, for example, how to include people in developing this technology, who owns the data that people provide, how this technology might be biased against people with disabilities, and who gets the benefit of these technologies. Some of these ethical challenges will be discussed in the third session of this course.
For blind and low vision people, there are a lot of things that might be challenging at the moment, where AI could help. For example, how can you get from your home to a place you have never been before, without help? Or maybe you want to read a sign that someone has put up in a hallway? Or quickly send a text message without typing? In the following sections we will introduce some technologies for blind and low vision people that use AI.
Natural Language processing and speech recognition
We previously discussed Alexa – Alexa is able to interpret what you say and also reply. In other words, it recognizes and analyses your speech to be able to give an answer to it. This is an area of AI that is called Natural Language Processing, or NLP. We will talk about how this works in more detail in session 6 but here we’ll just introduce it.
Dictation is an accessibility feature that most of you are familiar with – instead of typing a message to your friend, you just speak to your phone and it transcribes your spoken words to text. You might have used Siri, or Cortana, or Google Assistant before. This means it recognises your speech and then translates it into text. However, bear in mind that everyone speaks slightly differently or might have an accent. Speech recognition therefore uses AI to predict the text from the speech sounds.
Other systems, like Alexa, might have a conversation with you. That means that the system needs to learn how to recognize your speech but also what you say means so it can answer you correctly.
NLP and speech recognition are hot areas in many accessibility projects developed nowadays. They are useful for the blind and low vision community but really also anyone who cannot use a keyboard.
The second example we are going to talk about is text recognition. Recognising text is useful for daily activities like reading a letter, identifying daily consumable medicine, distinguishing objects when shopping, finding out the right bus stop or reading the signage in a street. The most common tools that some of you might be familiar with is optical character recognition, or OCR systems. These systems can scan printed text and then have it spoken out loud or saved to a computer file. Most of these systems only work well if the scan is of a document that has no or a simple background and well-organized characters in nice straight lines. It doesn’t work so well with a packaging box with multiple decorative patterns or a medicine with a lot of dense writing on it. There is ongoing work in OCR tools today using AI. A lot of improvements have been made to use smartphone cameras in to extract significant text information from objects with complex backgrounds and multiple text patterns. Mobile applications like SeeingAI have a text recognition feature that is able to read documents and recognise the structure of the document such as headings and paragraphs allowing you to rapidly skip through the document using voiceover. Text recognition is an area of research in computer vision, a field of AI we will discuss in more detail in session 7.
Navigation and getting around
The last example we are going to discuss is navigation. Instead of using visual information about locations and obstacles, people who are blind or low vision might use white canes or guide dogs to help them get around. However, AI can help you explore your surroundings much more.
There is a lot of research nowadays that focuses on designing automated navigation systems using AI. But what exactly is this? Imagine if you were able to carry with you a system that was able to see and give you instructions about your surroundings. This will be a complicated AI system that uses multiple algorithms from computer vision and natural language processing to be able to see and describe your environment but also give you navigation instructions. A system like this will be able to help you perceive and navigate to indoor or outdoor places. There are many navigation systems proposed for blind and low vision people but only a few can provide dynamic interactions and adaptability to different environments. The system we just described is called a visual imagery system that uses a camera to see and analyse your surroundings but there are also other systems for navigations such as non-visual systems that instead of a camera use different kind of sensors to detect if you are standing in front of an obstacle or not as well as map-based systems that mainly use tactile or audio feedback to give your information about a map. In Session 10, we will discuss in more detail about the many developments in the provision of navigation aid for the blind and low vision people, what kind of AI is used and how this can be improved in the future.
2.4 Smartphone AI for accessibility apps: An example
In the last part of this session, we will cover smartphone AI for accessibility apps. Smartphones have been a revolution to bring accessibility to blind and low vision people. Some of these apps use AI and have significantly enhanced the lives of people who are blind or low vision. Let’s explore an app together!
TapTapSee is a free mobile app that can make visual information available for you. TapTapSee uses AI technology to automatically identify anything in a photo. All you have to do is to take a photo of something you wish to identify. The application then processes the photo and tells you what it is you have taken a picture of. While the application does not process the image fast enough for numbers on a bus, it is good for items around the house, as well as reading the print on signs. It works with anything, whether you have taken a picture of a shirt, a bottle, or a car. Are you looking for a can of soup in your cupboard? Which is the can of soup and which is the refried beans? Or how about money? Are you holding a five-dollar bill or a twenty? You can use TapTapSee to find out!
In order to be able to identify the object photographed, TapTapSee uses a field of AI called computer vision. With computer vision we can train computers to interpret and understand the visual world. Using digital images from cameras and videos and learning algorithms as the ones we discussed in the diagram above, machines can accurately identify and classify objects — and then react to what they “see.” Let’s now try to apply the definition of AI, we talked about before for TapTapSee.
What will the dataset be for TapTapSee? Like almost all image recognition apps, TapTapSee is trained on an image database called ImageNet. ImageNet includes a collection of nearly 14 million images but the part that is used for training has 1000 object categories. Each object category has over 1000 images of different objects with their names.
What about the learning algorithm? The algorithm learns to tell apart different objects.
And lastly what is the prediction? It tries to predict what is in the camera view so when it sees a new picture of a cat, it is able to recognise it.
Let’s now try to use TapTapSee and discuss the experience.
First, let me show you how you can use it. Make sure you have voice over on, double-tab to the right side of the screen to take a picture or double-tap the left side to take a video. There is a repeat where you can use and activates auto flash in low lighting.
Now try to download TapTapSee to your phone.
TapTapSee can be used both on an iPhone and Android.
If you have an Android phone you can use the following link or search for TapTapSee on Google Play.
TapTapSee link for Android: https://play.google.com/store/apps/details?id=com.msearcher.taptapsee.android
If you have an iPhone you can use the following link or search for TapTapSee on the app store.
TapTapSee link for iPhone: https://apps.apple.com/us/app/taptapsee/id567635020
Ok, when you are set up, do the following tasks:
You can try the app out on all kinds of things. Let us know how you got on with using the app. Did it work well? Did you have any problems using the app? What things can be improved?
What’s in the next session?
In the next session we will explain in more detail the definition of AI, focusing on each of the three elements dataset – learning algorithm – prediction separately. We will talk about the importance of data and why these are needed as well as the algorithm development and the different tasks it performs, in order to give us the right predictions.