Each week we find a new topic for our readers to learn about in our AI Education column.
How did we get here—and so quickly?
Seriously, folks, we remember the days when personal hard drives and color monitors were considered cutting-edge—there are many still alive that remember the days of UNIVAC and ENIAC—and here we are talking about software that can learn, read, write and speak as if it were a person. That’s a lot of ground that’s been covered in 80 years.
And that’s why we’re looking into some of the conceptual foundations for the generative artificial intelligence we’re using today. For the last two weeks, we’ve looked at different types of AI model architecture, starting with diffusion models and then moving to encoders, decoders and the combination of the two. Now it’s time to reach back into artificial intelligence’s past, nearly four decades, to uncover an early type of generative AI.
This week on AI Education, we’re going to discuss Boltzmann machines, another type of machine-learning architecture crucial in the development of artificial intelligence technologies. We should keep in mind that Boltzmann machines are not physical devices—rather, they are virtual machines, or software. Boltzmann machines are named after Ludwig Boltzmann, an Austrian physicist and mathematician who lived most of his life in the 19th century and never saw or touched an electronic computer, let alone software.
What Is a Boltzmann Machine?
We’ve given some breadcrumbs already—a type of machine learning or AI model architecture, not a physical machine, an early variety of generative AI that was developed in the 1980s. A Boltzmann machine is a precursor to today’s deep learning technologies and generative AI—in many ways, today’s GenAI boom was made possible by the development of Boltzmann machines.
Here we ask you to recall our discussion on diffusion models, which were designed to virtually mimic the behavior of real-world physical systems for the purpose of physics experiments, operating by adding “noise” to data until it is rendered unreadable, then learning the process in reverse. Boltzmann machines also are designed to mimic real-world physical systems—energy systems. In fact, that’s where Ludwig Boltmann’s greatest contributions to science and mathematics lie, in attempting to explain mathematically how energy moves through matter and space over time.
We’re not going to give a full particle physics lesson this week—the technology we’re discussing is quite enough to digest—it suffices to say that energy, on a subatomic level, behaves in a predictable manner. Like water (and other items) moving through plumbing, energy tends to settle at its lowest possible state. In a Boltzmann machine, the connections between nodes are analogous to energy, and as the neural network ingests (or learns) data, the machine, over time, tries to find which connections help it minimize overall energy consumption. This process enables the software to learn from sets of ingested data.
What Does That Have to Do With AI?
There are two or more layers, comprising two types of layers of neuron-like AI nodes within a Boltzmann machine—hidden and unhidden. Unlike most deep learning machines, Boltzmann machines do not have output nodes, which makes them stochastic (or non-deterministic). Every node is connected to every other node, and every node can report being in one of two states, “on” or “off.” When the software ingests data, it assigns weights to that data based on an initial energy state, and then tries to optimize the distribution of the so-called energy within its system to reach a virtual thermal equilibrium, which allows the machine to manipulate and “learn” the data it has been given, even un-labelled data, without being directed or supervised—that’s right, a Boltzmann machine is the simplest solution to creating software capable of unsupervised learning.
Boltzmann machines can use this process to find relationships and features within a body of data in binary (two-part) vectors. By repeating the process, Boltzmann machines can learn the data, and eventually generate new information based on the data they were trained on.
Types of Boltzmann Machines
General
A general Boltzmann machine has direct connections between all nodes. While effective, it is not at all efficient, requiring large volumes of computing power and energy, and while not purely theoretical, general Boltzmann machines are rare.
Restricted
In a restricted Boltzmann machine (RBM), visible and hidden nodes are not directly connected within the same layer. RBMs always have two layers. As a result, restricted machines can work somewhat faster than general Boltzmann machines and with lower energy demand. A continuous RBM is built to accept continuous input.
Deep
A deep Boltzmann machine (DBM) is similar to an RBM, but comprises more than two layers, with additional hidden layers and connections between nodes.
Deep Belief Neworks (DBNs)
The easiest way for us to think of DBNs is as a combination of multiple RBMs, stacked into layers. In traditional deep learning, these types of layers are trained simultaneously, but in DBNs, they are trained sequentially. This results in lower layers being responsible for simpler features within presented data, while higher layers account for more complex and abstract features.
What Do Boltzmann Machines Do?
By virtue of their structure, Boltzmann machines are natural optimization engines. There are three types of problems that Boltzmann machines are commonly applied to:
Hand-written Digit Recognition
Boltzmann machines help power the technology that reads and understands legacy financial documents like personal checks—the next deposit we make at an ATM might be analyzed by a Boltzmann machine.
Search Problems
Boltzmann machines can be applied to pattern recognition tasks like fraud detection.
They can also be used to power recommendation engines—the dataset used by the Boltzmann machine could be user interactions over time, with which a predictable pattern of user behavior could be learned, enabling the machine to generate next-steps and recommendations personalized to the user.
An RBM can even be used to enhance radar-guided targeting in defense applications.
Learning Problems
Boltzmann engines can be deployed to recognize patterns in images or sets of images, or, to analyze images in real time as part of a computer vision system.
Limited Generative Capabilities
Once a RBM is trained on a dataset, it can be asked to produce similar data using the same probability distribution. Which means, if an RBM has been used to ingest a set of images, a user could ask the machine to generate a new image similar to the ones it has ingested. Thus, RBMs can be used to generate text, images, code, or synthetic training data for other AI models.






