Each week we find a new topic for our readers to learn about in our AI Education column.
Last week in AI Education we introduced the broad concept of artificial intelligence models. We briefly discussed a few different types of AI models, including the type of artificial intelligence we’re going to focus on this week: Large language models.
A large language model, which we’ll be calling an “LLM” for brevity’s sake, is any AI model built to read and understand text—their technology was built around the ability to read. Today’s widely used LLMs perform natural-language processing, giving them ability to generate and translate text. Sophisticated LLMs read and write like a human.
LLMs are trained on very large data sets—today’s LLMs are capable of unsupervised training, meaning that they perform self-learning over time. LLMs typically ingest data from the internet at large, or from broad but specific corners of the internet, like Wikipedia. LLM’s include OpenAI’s GPT-3 and ChatGPT, Cohere’s Command, Google’s Gemini and Microsoft’s Orca.
What Can an LLM Do?
Large language models can:
- Summarize content and extract data, including performing academic research and analyzing scientific data to determine results and draw conclusions.
- Organize and classify text, including sorting text and documents into different classes based on meaning or sentiment. Sentiment analysis can also be used to read a person’s tone in speech or writing
- Complete text and make predictions, even from a small amount of input or a few prompts.
- Translate text between languages—from one language to another, from one language to several languages, or from several languages into one language—so fast that it appears close to real time for the end user. This use case can be applied to text-to-speech and other accessibility formats.
- Answer questions based on content they’ve read and trained on, up to and including general knowledge questions. LLMs power chatbots and virtual assistants.
- Write content—many AI models, including ChatGPT, GPT-3 and Cohere Command write original copy. I use ChatGPT to help write content for AI & Finance. Some LLMs, like Amazon Q Developer, are proficient in coding in different programming languages and are capable of building websites from scratch. Many others can suggest style changes to text.
So what, right?
Well, LLMs are being used to augment search engines like Bing and Google. They’re being used to analyze—and write—molecular code, including pharmaceutical formulation and gene manipulation/editing. They’re being used for a number of customer service and customer experience applications beyond the chatbot. They’re being used to craft marketing campaigns and materials. Beyond academic research, LLMs are being applied to the time-consuming and laborious tasks of legal research. LLMs are already being used in the financial sector for fraud detection on top of their gradual adoption as customer experience enhancers.
How They Work
LLMs are based on deep learning architecture called transformers. They use combinations of neural networks to recognize words, understand the relationships between words and phrases, and find meanings within a text. LLMs use immensely large neural networks based on transformers that can discover and understand intricate, overlapping patterns like the kind found in languages. An attention mechanism directs the model to focus on specific parts of data sets. This attention mechanism is the secret sauce of today’s sophisticated LLMs, enabling them to read and write faster by prioritizing information. Just as neural networks are computer systems designed to mimic the human brain, the attention mechanism was inspired by human attention.
Training an LLM involves teaching context by instructing it to accurately predict the next word in a sentence based on the preceding words. The model learns to predict words by assigning them probability scores when they recur in the text it is being trained on. It’s a painstaking process made faster by humanity’s immense growth in processing power—LLMs are being trained on billions of pages of text.
Simply put, machines learn words and languages by turning them into math. In traditional machine learning, words are given firm, unique numerical values, making it difficult for the computer to understand the relationships between words or to distinguish between words with similar meanings. Large language models work, in part, by assigning multi-faceted values to words to allow the machine to understand the nuanced relationships in written language.
Today, large language models are also being trained on graphic, video and audio input, enabling them to see images and video and listen to speech or music, and interpret those inputs for users.
Are They Reliable?
For the most part, LLMs are accurate. They can hallucinate, like any generative AI, creating output that does not match the user’s intent. An LLM can predict words or phrases we want, and it can predict what word or phrase is best suited to respond to a question or to follow another word or phrase, but doesn’t mean they can interpret our feelings, desires, our biases—or our lack thereof. Moreover, a user can rather easily prompt an LLM to lie to them – and that user may also be primed to believe those lies.
With today’s LLMs, a lot of utility and reliability depends on the skill of the person querying the AI—do they know how to input a prompt just so to get the model to deliver an optimal result, and do they have the ability to read responses closely to determine if and when the artificial intelligence is hallucinating?
Like most emerging artificial intelligence, large language models are not only becoming more sophisticated over time, but also more accurate and reliable. The more they are used, the better they become. Gradually, large language models are expected to approach and perhaps surpass human-like performance. Of course, there are those who argue that human-like performance will be achieved and surpassed much sooner than later. OpenAI has claimed that GPT-4 is already delivering “human-level performance” in some tasks. Certainly there will be even more sophisticated LLMs to follow in GPT-4’s wake.