Each week we find a new topic for our readers to learn about in our AI Education column.
Welcome to another AI Education column, this time, we’re going to change directions again and discuss a specific AI application and company: Google DeepMind.
Here is where I point out that Google AI was discussed a little more than a year ago in this column, before I took over the writing duties, and if you’re looking for more of a general take on Google’s AI efforts, that piece is a nice place to start—it mentions that Google purchased DeepMind back in 2014.
So why are we talking about it now?
This week, Google DeepMind introduced an AI model that runs on robotic devices and does not require them to access data on a network or distributed computing power. In other words, the model can run locally on the device itself, an example of edge computing, where some or all of the “work” of computing and the data involved resides with the entity using the data. It’s just that typically, when we discuss edge computing as technology consumers, we’re talking about our phones and tablets and laptops or our office PCs, not a robot in situ in a workplace, but here we are. The new model will make possible more responsive, dexterous robots, primarily by eliminating latencies—delays in communication between the devices sensing their environment or taking natural language input, the devices processing the information, and the delivery of a response.
What Is DeepMind?
My apologies, as I’m cribbing a lot of this from the DeepMind site itself rather than doing a deeper dive, but DeepMind was founded in 2010 by three computer scientists, the British Demis Hassabis and Mustafa Suleyman, and New Zealand’s Shane Legg, as an interdisciplinary AI developer, combining practical machine learning, engineering, neuroscience and computer science expertise with more theoretical thinking, aiming to create a breakthrough in artificial intelligence. And they did, starting by teaching AI to play old Atari games.
Great notoriety followed those relatively humble beginnings. In the process of building better game-playing machines, DeepMind contributed to the advent of transformer technology, which underpins all modern generative AI and large language models.
In 2015, DeepMind’s AlphaGo became the first technology to defeat a human Go champion, following in the footsteps of IBM’s chess-playing Deep Blue and Jeopardy-playing Watson. Go, which Google DeepMind notes was a long-standing AI puzzle and a landmark achievement, is just one of the many games that AlphaGo and its successors mastered. DeepMind continues to train AI to play increasingly more difficult and complex games, but has moved far beyond games into fields including biology, where AlphaFold has contributed to the identification and classification of proteins, mathematics and geometry, and software development, where with AlphaCode the company offers an AI-powered coding engine that can create software at the rate of an average programmer.
Today DeepMind’s technology underpins many of Google’s generative AI tools, including Gemini and Veo.
What Makes DeepMind’s Technology Special?
Like most approaches to generative AI, DeepMind uses complex artificial neural networks to sift through tremendous amounts of data—like the kind encountered by game players—to find patterns, extract information and help make decisions. DeepMind uses deep learning processes to train its AI models.
DeepMind was founded with a mission to use AI to the benefit of humanity, with an optimistic, positivist approach to the technology—not only did AI represent a potential good to the company’s founders, but more advanced, human-like AI, artificial general intelligence, was something to be pursued and embraced, not feared and questioned. When their service-driven mission and positive vision combined with their practical and theoretical approaches to AI, DeepMind discovered a galaxy of applications for their technology beyond playing games.
Since the Google acquisition more than a decade ago, Google DeepMind evidently has preserved the mission and vision of DeepMind’s origins.
What Else Did Google Do With DeepMind?
As of 2014, DeepMind became part of Google DeepMind, a combination with Google Brain. Google Brain was Google’s experimental AI developer, founded as part of Google’s “X–The Moonshot Factory” division. In 2016, Google DeepMind rolled out WaveNet, a text-to-speech tool, that is offered via Google Assistant, Cloud Text-to-Speech, and Google Duo. In 2020 and 2021, DeepMind contributed to Google’s launch of LaMDA (Language Model for Dialogue Applications), a family of conversational large language models. In 2022, DeepMind released Gato, a versatile type of AI model. Subsequent releases include AI chatbot Sparrow, large language model Sparrow, and visual language model Flamingo.
In addition to Gemini, Google used DeepMind’s technology to create Gemma, its open-weight large language models—in April of this year, the company launched DolphinGemma, turning the model’s power towards decoding the clicks, squeaks and other vocalizations of dolphins and oceanic mammals.
In recent years, Google DeepMind’s applications have also moved to text-to-video generation (Veo), text-to-music (Lyria), 3D environment generation (Genie), archeological document restoration (Ithaca), and AI chip design (AlphaChip). Clearly, DeepMind’s capabilities are behind Google’s recent decision to make Gemini a multimodal model.
Robotics On the Edge
Back to the robotics topic, in 2023, DeepMind released RoboCat, an AI model that could control robotic arms. Last year, they released SIMA, or Scalable Instructable Multiword Agent, which is capable of following instructions to perform tasks in 3D virtual environments. This year, in March, DeepMind launched Gemini Robotics and Gemini Robotics-ER, AI models designed to improve how robotics int eract in real-world environments.
The most recent news, announced on Tuesday, is Gemini Robotics On-Device, a foundation model that gives robotic arms near-human dexterity and responsiveness. According to recent reports, the model can power robotic arms to accurately fold clothes, zip and unzip bags and lunchboxes, pour salad dressing and draw playing cards from a deck on demand, and can quickly learn how to perform new tasks. It is essentially a smaller and more efficient verson of Gemini Robotics.
While the new model was trained on Google’s ALOHA humanoid robot, it can be applied to other robot types. Gemini Robotics On-Device is an advance that moves the world closer to humanoid robots with the skills to learn and perform general tasks.