AI EDUCATION: What Is a Foundation Model?

858

Each week we find a new topic for our readers to learn about in our AI Education column. 

We love a good multi-tool. 

That makes us sound handier than we are—actually, our wife is the “Mrs. Fix-It” in the relationship—but multi-tools, tools that combine two or more tools into one, are cool—like a Leatherman set of pliers or the more commonly known Swiss Army Knife. 

Welcome to AI education, where we’re going to discuss how gigantic AI multitools called foundation models not only offer us general-purpose artificial intelligence, but are also being used to help build and train more specifically applied tools. As we’ve said before, most AI that has been deployed to this point has been narrow AI trained with specific data to perform specific tasks. Even a couple years after it burst into the zeitgeist, generative AI, which means foundation models as well as large language models, has presented something different: a highly functional, general-purpose artificial intelligence.

While today’s foundation models are not artificial general intelligence, which would be AI capable of reasoning at a human level, their general-purpose functionality is a key stepping stone towards achieving artificial general intelligence. 

Foundation models are trained from very large datasets in order to accomplish a broad range of tasks –they’re not trained to perform just one function extremely well, but to perform many functions pretty well. Foundation models are also capable of transfer learning, meaning, they can apply knowledge gained from performing one task to other subsequent, different tasks. They are not necessarily large language models (LLMs), but the AI applications commonly referred to as large language models are developed from foundation models—or are actually foundation models themselves that deal primarily with text.  

If we think of sophisticated AI chatbots like ChatGPT, both a large language model and a foundation model are likely being employed. While a LLM might be responsible for a chatbot’s ability to understand our natural language input, and it’s ability output human-like response, under the hood, a foundation model is probably retrieving the information used to generate that response. Unbeknownst to us, our request is being translated by an LLM, handed off to a foundation model that sifts through the data to put together the right answer, which is then handed back to the LLM to be translated into natural language for the chatbot’s response. 

Yeah, But What Is a Foundation Model 

So they use a lot of data and they are generalists, but what the heck are they? Don’t you love it when we tell you about something, but don’t really explain what it is? Before it is placed in the context of data and tasks that it is performing, a foundation model is software in deep learning architecture, a multi-layered AI neural network. Distilled to its most basic concept, the software is designed to make predictions from a given set of information, and then test whether those predictions were accurate. 

Like a large language model, a foundation model usually uses transformer technology to digest data. Data and output can be of a particular modality, like images or audio, or multimodal inputs and outputs can be used. Foundation models are usually trained using self-supervised learning, where foundation models are repeatedly challenged to find relationships within sets of unlabeled data. 

Designing and training a foundation model, to this point, is cost and skill prohibitive for most individuals and enterprises. Lucky for us, many foundation models are available in an open-source format where they are accessible to use via an online interface, or can be downloaded and run locally on our personal or business machines. 

Properties of Foundation Models 

According to Splunk, foundation models have five necessary properties: 

  1. They are scalable and can efficiently process large volumes of data. 
  2. They are multimodal—this is a property that we do not deem necessary for a foundation model, but Splunk argues that they should be able to access information from across multiple domains. 
  3. They are expressive—foundation models are capable of representing knowledge clearly and accurately. 
  4. They are compositional—foundation models can generalize information to be used in downstream tasks. 
  5. They have a high memory capacity—foundation models can accumulate a vast amount of knowledge and learn on new data without forgetting previously learned knowledge. 

Examples of Popular Foundation Models 

  • GPT – Probably best known from the large language model chatbot, ChatGPT, Open AI’s GPT-4 is a multimodal foundation model. 
  • Gemini – Google’s family of multi-modal foundation models. 
  • Claude – Anthropic’s multi-modal foundation model. 
  • Llamas – Meta’s open-source foundation model series. 
  • Grok – xAI’s foundation model. 
  • BLOOM – Open-source foundation model developed  via a scientific collaboration. 
  • Mistral Large – foundation model from French provider Mistral. 
  • Granite – IBM’s flagship decoder-only foundation model 
  • Command  – foundation model from Cohere 
  • Amazon Titan – foundation model from Amazon Web Services 

Great, We Have a Foundation Model. Now What? 

Foundation models come pre-trained, that is, they’ve already been forced to ingest a huge pile of training data and are ready to help users with a number of tasks—but they’re not done learning at that point. These models are capable of learning continuously and refining their abilities in the inference phase of their existence—AI inference is when an AI is trained and allowed to perform as intended. 

In other words, at this point, when we encounter a foundation model, it’s like a lump of clay ready to be shaped by our wants, needs and experiences, and we can direct it to build skill in a number of areas—especially if we’re working with a multimodal foundation model. This is considered distinct from training the model and is called refining, adapting or fine-tuning the model. 

Through adaptation and fine-tuning, we take our generalist foundation model and turn it into a specialist. While prohibitively huge datasets are needed to build a foundation model, relatively small and simple datasets can be used for fine-tuning. Foundation models like GPT and Claude can be acquired as a baseline, and then adapted to meet specific needs. 

Applications of Foundation Models 

Some common applications of foundation models include: 

  • Natural language processing, or NLP—including speech-to-text. Refining a foundation model to perform NLP can lead to the development of a large language model.  
  • Computer vision, particularly in applications like robotics and autonomous driving. Foundation models can also be trained to look through specific types of visual data, like medical diagnostic imagery, to find anomalies and aid in the diagnosis process. 
  • Software development—foundation models can be trained to code in various languages, and can be trained to do so using natural language inputs. 
  • Materials science, biotechnology and genetics—foundation models are used at the molecular level to create new substances, analyze protein structure, discover new therapies and analyze and edit DNA. 
  • Earth science—foundation models are helping to analyze and predict the weather and track long-term changes to the climate 
  • Decision-making support—foundation models can be trained to sift through particular types of data to surface recommendations, next-best actions and behavioral nudges.