Each week we find a new topic for our readers to learn about in our AI Education column.
This week on AI Education we’re going to discuss one of the most promising fields in artificial intelligence, one that helps bridge the digital and physical worlds: Computer vision. Computer vision is a form of artificial intelligence that allows computers to read and extract information from visual media like video and static images in a manner similar to human vision.
While computers have long been able to process and produce images by assigning numerical values to color and intensity, computer vision specifically aims to help machines see the world like humans do. This is accomplished via the use of compound artificial intelligence to extract information and make sense of images. Per IBM, “If AI enables computers to think, (then) computer vision enables them to see, observe and understand.”
The theory behind computer vision began to emerge in the mid-20th century, just as computers were beginning to emerge and our knowledge of the brain and eye expanded rapidly. Over the latter decades in the 20th century, biologists and neurologists unlocked many of the secrets of human vision—many of the earliest studies contributing to this body of knowledge involved not human, but feline vision.
Computer vision, however, would have to wait until the development of deep learning algorithms and neural networks to make it possible for machines to not just view an image, but to understand what they were seeing and translate information from that image across other formats. We’ll have more on what makes computer vision tick later.
How Did We Get Here
AI news in 2024 has been full of headlines about advancing computer vision capabilities, and this most recent week was no exception. An Imaging & Machine Vision Europe article described a combination of camera technology from Intel (a camera) and vision processing capabilities from Geek+, a computer vision company, that allows machines to see with better depth perception, increasing the accuracy of autonomous robots and vehicles.
In similar news, Live Science published this week an article about Spot, Boston Dynamics’ rather alarming looking AI dog, which is now able to play the game “fetch” when interacting with humans or other machines. This is made possible via technology from MIT that allows Spot to identify parts of their vision most relevant to the task they’ve been assigned, enabling it to follow complex instructions even amid visually distracting environments.
Then there was this tidbit from Quality Magazine in a story about how advanced deep learning is enhancing computer vision-guided robotics applicactions: “Unlike their historical counterparts, vision-guided industrial robots with AI can adjust their own performance. For example, robotic welders and adhesive dispensers not only become more accurate with the addition of AI-enabled vision guidance, but they can also inspect the quality of their welds or beading and make corrections to ensure that quality specifications are met.”
Finally, just this week, we have another market analyst report, “AI in Computer Vision” from Markets and Markets. The AI in computer vision market is expected to grow from $17.2 billion in 2023 to $45.8 billion in 2028, at a 21.5% compound annual growth rate.
How Computer Vision Works
According to IBM, computer vision works pretty much like human vision. Meaning, once we understand how images moved from the eye to the brain, and how they were being read and understood by the brain, we knew how to make computer vision a reality. The eye takes in the visual input, but the brain converts that input into meaningful information telling us what the eye is seeing. The big difference is that humans are trained over a lifetime to understand their vision—how to tell one object from another, how to tell how far away something is, how fast something is moving, how large it is in comparison to something else—our eyes and brain do this seemingly effortlessly.
Yet, as it turns out, today’s computer vision is capable of seeing a lot of things more accurately and efficiently than human vision. This is because computer vision applications are usually being trained for very specific tasks—while human vision is forced to be a generalist as we live out our lives, computer vision can be trained to do the same things repeatedly, and do them very quickly and very well. Computer vision is also being trained on tremendous volumes of data—hundreds of thousands, millions or billions of images and video—that it would be inefficient and inconvenient, if not impossible, to train a human brain on.
Computer vision uses machine learning to understand the context of visual data. These thousands—or billions upon billions—of images help a computer learn distinctions and, eventually, recognize images or parts of images. A type of neural network called a convolutional neural network, or CNN, breaks down images into parts that are given values or labels, then attempts to predict what it is being shown using mathematical calculations called convolutions. The CNN discerns the shapes and edges within an image and tries to identify its constituent parts. As it continues, the CNN identifies colors, forms and textures, checking its work as it proceeds until its predictions are accurate—which assures the computer that it is seeing correctly.
Why Is Computer Vision Important
Computer vision is already widely implemented, and as businesses and researchers discover new use cases for computer vision, it will become even more prevalent. Generally speaking, computer vision enables machines to classify images and detect objects within images, track objects across multiple images or video, and to retrieve images based on content instead of metadata.
More specifically, computer vision is already being used by applications like Google Translate—according to IBM, one can already use a phone’s camera to snap an image of a street sign, and translate it immediately to another language. In the industrial world, as we have already mentioned, computer vision is helping to power next-generation robotics. In healthcare, computer vision is sifting through mountains of diagnostic data, including x-rays and CT scans, to detect anomalies like cancer.
Autonomous vehicles use computer vision to “see” the world around them and react accordingly. Computer vision is what allows a self-driving car to see other cars and pedestrians in their vicinity, as well as read traffic signs and lane markers, and respond to traffic signals.
Computer vision is also being used in the security realm to automate many guard functions. Rather than allow inaccurate and easily tired human eyes to watch monitors all night, computer vision can alert security personnel if something out of the ordinary is detected by on-site cameras.
In other words, computer vision is not the next development to come in years or decades ahead—it’s already here, and we’re already reaping the benefits.