AI EDUCATION: What Is a Transformer (and Is It More Than Meets the Eye)?

605

Each week we find a new topic for our readers to learn about in our AI Education column. 

Full confession mode here: I am a child of the 1980s. In my childhood was encouraged to play in the real world, or, if seeking escape, to read a book. My parents tried their hardest to steer me towards educational television, or at the very least, towards more creative and performing arts oriented broadcasts, with occasional forays into sports as a treat. They, of course, failed, and I was exposed to the heaps of animated crap that was foisted upon Gen X and millennial children—in particular, the Transformers. Yes, I eventually got the full-sized Optimus Prime action figure, and for a full two weeks, I was the coolest six-year-old boy on my block. 

But in AI, a transformer is very different from robots that can change from humanoid to vehicle form at will—or the devices in our electrical grid that pass energy between circuits. A transformer is a type of neural network. Transformers are used to understand and organize complex data using a very large number of variables. While most software to date has been developed to perform calculations, transformers can understand interrelationships and contexts within data—especially sequential data—and create nuanced hierarchies with human-like ability. 

In other words, an AI transformer can make a computer capable of human-level learning and discretion. Transformers are the technology behind large language models and modern natural language processing and translation; next-generation search capabilities; drug formulation and gene editing; autonomous vehicles and robotics; and fraud and anomaly prevention. 

How We Got Here 

Well, for starters, transformer is the “t” in GPT (generative pre-trained transformer). We’ve mentioned transformers in recent AI Education articles on large language models (LLMs) and neural networks. Increasingly, in artificial intelligence news, transformers have been put forth as the most promising path towards an artificial general intelligence, that is, an AI that is capable of performing any cognitive task at a human level. 

While GPT is probably the most familiar AI model series using transformers today, it’s not alone. Meta’s LLaMA (Large Language Model Meta AI), GitHub’s Copilot and Google’s PaLM (Pathways Language Model) are other broadly applied examples. As these transformer-based models are used over time, they are self-optimizing, meaning they become better readers and writers. Also, through successive generations, transformer-based AI models are becoming better predictors, also meaning that they are reading and writing—and performing other tasks—more accurately. 

All things being equal, due to their self-optimizing nature, transformers should eventually reach the artificial general intelligence threshold, and indeed, some companies and researchers claim that certain contemporary models have achieved general intelligence. In recent months, however, the AI discussion seems to have pivoted towards asking if transformers represent the best path towards more sophisticated artificial intelligence, or if there might be more promising alternatives. 

How Do Transformers Work 

Transformers were first proposed in 2017 by a research group at Google in a paper, “Attention Is All You Need.” They’re a multi-layered neural network, or layers of interconnected neural networks. One or more layers act as an encoding mechanism, reading or ingesting information and tagging bits of data. Two or more layers then act as a decoder, generating an output sequence based on the model’s understanding of the data.  

Data moves through the neural network not only through the work of the individual nodes, or neurons, within the network, but also via the guidance of an external attention mechanism. The attention mechanism is an algorithm or set of algorithms that help the transformer understand the relationships of different—and sometimes distant—bits of data within a large data set, like a finely detailed, multidimensional map. The attention mechanism enables a transformer to, in some ways, know an entire set of data, all at once, simultaneously. 

If I was a transformer, and my data set was the novel, the attention mechanism would be more than an index or a table of contents: it would help me instantly understand how each letter is part of a word comprised of inter-related letters, each word is part of a sentence, each sentence is part of a paragraph, page, chapter, book, volume, and so on. They caveat is that a transformer requires a lot of computing power. The rise of transformers, however, has luckily coincided with the rise of powerful, efficient GPUs capable of supporting the AI boom. Of course, in an even more blissful coincidence, the timely advance in GPUs was driven in part by the proliferation of cryptocurrencies, but that’s a story for a different day. 

What Do Transformers Do 

Transformers are already powering every modern chatbot and image generation tool. They’re helping people translate between languages in real time. They can already be taught to read and write proficiently in any code, including DNA and physical elements. A transformer may be the next pre-eminent genius chemist, biochemist and physician. They’re already formulating drugs, cures and other products. Transformers are guiding autonomous driving and driver-assist technology 

In the future, transformers have the power to do even more. Rather than just power one person’s autonomous vehicle, transformers could power an entire multi-modal transportation system on a local, regional or global scale. Transformers will help staff our factories as they can be brought to bear on an entire automated manufacturing process. 

Transformers, in the near future, will be applied to music, medical devices, even financial markets. Their greatest limitation, however, is the amount of computing power required to support them as they grow in complexity. The GPU gold-rush set off by cryptocurrency mining operations may be a drop in the bucket compared to the oncoming AI-related demand for processing power—which is one element making ongoing research into quantum computing and other alternative computational methods, as well as other AI architectures, so important. 

What’s Next 

One common prediction, as already outlined, is that transformers continue to improve until they reach and surpass human-level cognitive abilities. Ever-more complex artificial intelligence models will be built as transformers of transformers, until an AI thinks as well as or better than a person. Our financial sector readers would be fascinated to learn that there are already compound, transformer-based AI models being built on Daniel Kahneman and Amos Tversky’s two-system model of the human mind.  

So, if extant transformer technology is the long-term future of AI, then influence and economic power should begin to shift away from the hard scientists and technicians designing and building the technology and back towards creatives and social scientists who find new and more nuanced ways of using, querying and prompting different models. 

But there are also some promising alternatives to transformers worth discussing, some of which have been created by former MIT researchers at Liquid AI, a tech startup. Liquid’s models are designed to perform similarly to transformer-based models but much more efficiently by using differently structured neural networks, called liquid neural networks.