Each week we find a new topic for our readers to learn about in our AI Education column.
Jargon. Lingo. Slang. Idioms.
It seems like a lot of AI Education ink is being spilled dispelling or defining the short-hand terms technologists are using to discuss and describe artificial intelligence, and this week is no different, as we’ll be discussing AI inference. Now, inference is a word that we see all over AI articles, but it is rarely explicitly defined for readers, and rarely is enough context given for an unacquainted reader to divine its meaning. Yet it is pretty clear and simple to understand, for once, because AI inference is exactly what it sounds like.
When we discuss artificial intelligence as software, we may divide its lifecycle into distinct phases. There are periods of development and training involved in creating the software and teaching—or training—it to behave and act in the manner that we want it to. Sometimes included in that training, sometimes standing alone as its own distinct phase, is a period of fine-tuning to “teach” the software specific tasks or workflows.
AI inference is pretty much everything we do with an AI model that comes after that development and training and fine-tuning. It is the action, deployment and implementation phase of AI. It is best understood as a simple process of just a few steps—first, data is provided by the user and prepared for the model (either by the user, or, in most current contexts, by the model itself); then, the model takes a forward pass through the data, looking for patterns that match what it learned in its training; finally, the model produces an actionable result and reports that result to the user.
What AI Inference Is
AI inference is when we use a trained AI model to make predictions on new data, according to IBM. AI inference is so named because an AI model uses information it has gained from processing its training data to infer the correct output for a given input. When our AI models are working with new data and new inputs, they don’t really know whether they predictions or decisions they make are true. They are inferring the truth (which, in part, is why they sometimes hallucinate truths that are objectively falsehoods).
For example, the spam-blocking technology in our email inboxes doesn’t really know for sure whether an incoming message is spam or not, it is inferring whether a message is spam based on patterns it learned from examples it was exposed to in training, according to IBM. In the financial industry, an AI model forecasting stock prices doesn’t really know what a company’s stock is going to do in the next five minutes, tomorrow, next week or in 5 years, it infers what will happen next based on past trends. Even a large language model generates text by inferring what the next word will be based on the patterns it has learned from text samples it was trained on.
Machine learning is just pattern recognition, points out IBM, where models are trained to optimize performance on a dataset, the models parameters adjusted until its decision-making fits its training data. As long as the training data is relevant and realistic, the model should be able to perform as expected—make accurate predictions, generate the right text, divert and block the right messages—when it enters the inference phase.
Training Versus Inference
Several sources compare and contrast AI training with inference and ask which is the greatest draw on computational power and energy. The answer is actually inference. If we think about it, we only train AI during a set period of time, within established parameters, with a finite amount of data—but the inference phase of an AI model may be perpetual and continuous, and the amount of data it may be designed to take in as input might be for all intents and purposes infinite. Right now, with today’s generative AI models, around 90% of an AI’s lifecycle is spent in the inference phase
Some Types of AI Inference
Cribbing from IBM and around the web, AI inferencing is split into different forms depending on where the computation is taking place and how the computational tasks are being distributed. Recall that in computing, the work can either be done locally (on-site), or remotely; it can be done on the edge (again, ostensibly on-site), or it can be done in the cloud (distributed to one or more data centers), it can be done immediately or as it is were in real-time, or it can be scheduled and delayed.
IBM splits up AI inference into online inference, which is immediate, sequential inference appropriate for more real-time or time-sensitive applications of AI, like electric vehicles or large language model chatbots; and batch inference, which is when a large volume of inputs may be processed asynchronously in batches, which is significantly more resource efficient.
Redhat and Oracle add to IBM’s categories a third, streaming inference, which is applied to non-human sources of input like other artificial intelligences or sensors that may feed a model a constant flow of data that may need to be used to make decisions or predictions
Google divides AI inference into cloud inference, where computational work is distributed to remote data centers, with online- and batch-inference subcategories; and edge inference, where the computational work is mostly done where the data input is collected, affording users greater privacy and lower bandwidth costs.
Is It Just That Simple?
When an organization deploys an AI technology in its operations, it is in the inference phase of AI. It’s not as simple, however, as moving an appropriately trained AI model into an operational environment. Successful implementation and use of trained AI models still requires organizations to consider some or all of the following:
The quality of the input data—Garbage in, garbage out is still the rule in computing.
Model selection—Choose the right model for the needs and tasks at hand.
Hardware—Do you need to run an AI model on an edge device? Then the edge device must be AI-capable.
Explainability—Do you—or other users—understand how and why a deployed AI model comes to its conclusions? The is particularly important in closely regulated industries. Speaking of which…
Compliance—Regulations and compliance concerns expand beyond explainability to issues like data privacy and security.
User skill and compatibility—Do workers know how to use the technology, and are they willing to implement it into their workflows? Without the buy in, deployment becomes an uphill battle.






