AI EDUCATION: What Is Embodied AI?

125

Each week we find a new topic for our readers to learn about in our AI Education column. 

It’s October, the month where everything starts to get creepy, so why not talk about bodies. No, that’s not a comment on our physical appearance, and we’re not taking a trip to the morgue—we’re curious about these mounds of flesh that we all animate with thoughts and feelings.  

Well, what if we were to animate something body-like with artificial intelligence? 

Welcome to another AI Education, where today, we’re going to talk about embodied AI, which is not exactly what you might expect from our introduction. To us, at least, and at first blush, embodied AI sounds like we’re putting artificial intelligence into some kind of humanoid robot so that it can walk around and interact with us. 

As it turns out, that’s not too far from the truth. Embodied AI is actually an extension of physical AI, a topic we tackled not so long ago in AI Education, and it comes in all shapes and sizes, and in an already large and quickly growing array of functions. 

Embodied AI versus Physical AI 

Most of the AI we’re currently exposed to is informational AI. Informational AI works with rote data. It was trained on huge sets of textual, audial and graphical information, and most of the output it generates is in the form of text, audio and graphics. Maybe we should think of informational AI as the liberal arts campus version of artificial intelligence applications. Physical AI and embodied AI would be more like the trade school version of AI applications. 

On the most basic level, we could think about putting large language models or foundation models into the software of already existing and operational robots to help them interact with humans in the workplace, or to perform some AI inference tasks. This superficial application of AI falls short of the definition of embodied AI, not to mention physical AI. 

Physical AI, when used properly, describes the integration of AI with real-world physical systems, like robotics, to enable them to operate autonomously in the physical world—to sense and respond to their environment. Embodied AI, on the other hand, is when we embed AI into physical systems to allow them to learn from interaction with their environment. Think of embodied AI as a subtype of physical AI—physical AI is more concerned with the output of AI within a physical system, while embodied AI is more concerned with learning through physical experience. 

More importantly, embodied AI represents the idea that artificial intelligence doesn’t necessarily just come from algorithms, but can also be created via physical interactions with an environment. 

So, like physical AI, embodied AI can include not just general-purpose humanoid robots, like those in our imagination, but also robot arms and other robotic configurations, autonomous vehicles and smart spaces like rooms, factories and warehouses. And, like physical AI, embodied AI relies on machine learning and AI technologies as well as sensors and computer vision to perceive, reason and act in a physical environment. 

Yeah, but What (Specifically) Is Embodied AI? 

The Lamarr Institute for Machine Learning and Artificial Intelligence, a German AI researcher, offers us five fundamentals of embodied AI that we will adapt here for our purposes: 

  1. Interaction with the physical world—as we’ve said, embodied AI is physical AI.
  2. Perception and Action Coupling—embodied AI connects the processes of perception an action, because an intelligent device that senses an obstacle must be able to decide how to navigate around it nearly immediately. Thus, one aim of embodied AI is to shorten the gap between sensing and acting as much as possible.
  3. Learning through Experience—we’ll get to this more in our next section, but embodied AI is intended to learn as it goes and improve over time just as human beings do: through a process of trial and error.
  4. Contextual Understanding—embodied AI is usually designed to operate within a specific context or environment. Being aware of this environment allows the technology to make decisions informed by what is happening in its surroundings.
  5. Sensory Integration—while many physical AI systems may operate using one sense—like computer vision or touch—to gather information, embodied AI systems use different senses so that they may function within a physical environment with more precision and effectiveness.

How Do We Make an Embodied AI? 

Obviously, there is a physical manufacturing component to creating an embodied AI system—some physical robotic element, complete with sensors, has to exist to be operated or manipulated by our artificial intelligence system. It is the combination of this physical element with the AI software that makes an embodied AI. We’re not going to concern ourselves today with the physical side of embodied AI, as it will be very different for each application. Instead, we’ll consider the underlying software operating the robotics. 

On the software side, embodied AI is built from a combination of technologies, but like any modern artificial intelligence, its building blocks are data. An embodied AI is first trained on data similar to that used to train informational AI—very large datasets from the web, for example. In addition, embodied AI may also be trained on data collected by sensors on non-AI robotic devices doing the kind of work that the AI is being trained to do, and synthetic data created via simulations in 3-D generated synthetic environments and digital twins. 

Remember the metaverse and all of that hype about a potentially hyper-realistic, 3-D, persistent, massive simulation of the real world? Well, it turns out that Mark Zuckerberg isn’t whistling “Dixie.” Embodied AI is one good reason why it might still be relevant and important to humanity’s future. 

So-called synthetic data, which is collected within simulations, will also be used to further refine an embodied AI model before it is implemented in a physical environment. Eventually, however, the software must be implemented in a way that it can learn within an actual physical environment. Reinforcement learning—a system of rewards and penalties for making the right or wrong decisions—is used to further refine the model. As the model proceeds on to later stages of training, it might also follow and imitate human workers within its environment.