AI EDUCATION: What Is an Encoder? What is a Decoder?

86

Each week we find a new topic for our readers to learn about in our AI Education column. 

Last week we introduced a novel AI model type, the diffusion model, which is used to power a lot of the image (and video) generation AI available to consumers. Today we’re returning to more familiar ground while staying on the topic of AI models. 

Welcome to another AI Education, where we’re going to define and discuss encoders, decoders and encoder-decoder architecture in hopes of translating some confusing technology-speak for our readers. We’ve already broached this topic in some detail in our discussions of large language models and transformers. Indeed, devoted readers of AI Education should already know that encoders and decoders are the two key components of transformer technology, the AI that powers large language models. 

The encoder-decoder concept comes from communications. Think of a telephone—a telephone has both a microphone, or a transmitter, that one holds near the mouth, and a speaker, or a receiver, that one holds near the ear. The transmitter transforms sound into electric signals, and the receiver transforms electric signals back into sound The encoder operates like a transmitter—it digests input, be it data or a user query, and translates it into language or instructions for software or a machine. The decoder operates more like the receiver—it takes the instructions translated by the receiver and turns them into further instructions for the software or machine to manipulate data, or, into a response for the user. 

What Are Encoding and Decoding? 

Encoding is a computing concept that pre-dates AI. Encoding, put simply, is a process by which software transforms data into a form in which it can be used. So an encoder might take whatever programming language we’re writing in and convert it to a binary form that your computer’s hardware can understand. An encoder might take plain text or voice speech and turn it into a programming language. Most of us have come across encoding most often in video and audio media, where encoding usually entails converting very large raw recordings into compressed digital files that are easier to use, transport and manipulate. 

Similarly, decoding involves the reverse move… so in this case, taking encoded data and moving it into a more raw format, be that amplified sound or a projected image or natural speech. Or binary code into a programming language, or that programming language into audio or plain text. 

In AI, encoding and decoding imply a blend of hardware and software—the actions of encoding and decoding in artificial intelligence cannot be separated from the complex, multi-layered neural networks that power them. When we speak of AI encoding and decoding, then, we’re not only talking about actions and processes, but also encoders and decoders, the software architecture and hardware enabling those processes. 

Encoders and decoders are each a type of transformer. Transformers are neural networks used to process sequential data. Crucially, transformers include self-attention architecture which allows AI models to weigh information by importance or priority. Self attention leads to AI that can understand nuance and context in language, and that can use the predictive and generative capabilities in encoders and decoders to plan and strategize. 

What Is an Encoder? 

An encoder is a part of a network or software that changes data from one form to another. In software, encoders most often simplify data so that it can be more easily understood and manipulated, but retains important features of the original input data. 

Encoders are like translators. In an AI model, encoders translate input by first tokenizing and embedding the text—this helps the technology find the relationships between paragraphs, sentences, words and characters. The encoder then understands each token in the context of every other token it has assigned. The process is repeated many times, even for a basic user query from a prompt, so that the technology can iron out ambiguities in the input. 

Standalone encoders can understand and answer questions and classify text—BERT is an example of a commonly used standalone encoder. An encoder by itself can generate insights from raw text and documents. To some extent they can provide feedback and support functions for businesses. But often, encoders are used to put data into a format that’s usable by computers—other software, other machines, or maybe the same software. Quite likely, another kind machine or software known as a decoder. 

What is a Decoder? 

Decoders are like generators. They study the tokens generated by the encoder and the contexts that tokens are found within, and use that knowledge to make predictions about what token should come next in a sequence. Even more simply, they take encoded data and convert it into usable content. 

Decoders differ under the hood from encoders in how they use their self-attention mechanism. While encoders use self-attention to understand how a segment of data—or a token—relates to all other segments of data in a given document, decoders use self-attention to only take into account how a token relates to other adjacent tokens—or how one piece of data in a sequence relates to the data that comes before and after it in that sequence. Decoders also have an additional attention layer focused exclusively on output. 

Standalone decoders can generate text content—so they’re particularly useful as writing assistants and storytelling aides. Most autocomplete tools are now powered by decoders. Decoders can power chatbots, too—for example, GPT is a standalone decoder. 

What Happens When We Combine Encoders and Decoders 

Well, technologists started combining encoders and decoders in software before they were used to build AI models, particularly in the realm of language translation. An encoder would be used to take in an inputted phrase from language one and translate it into code, while a decoder would take the code created by the encoder and then generate an output in language two. 

The encoder-decoder combination creates a powerful tool for translation and summarization of large, complex volumes of text or data in or into different languages. The encoder-decoder model is often used in speech recognition, question answering/customer service, video and image analysis (video-to-text or image-to-text), image captioning and data-to-text. Google Translate is an example of encoder-decoder architecture.