A
Agent Memory
How agents retain information within and across sessions.
Read definition →Agentic Workflow
A multi-step pipeline where an agent (or several) chain tools and decisions together.
Read definition →AI Agent
A model that takes actions in a loop until a goal is met, not just one reply.
Read definition →AI Copilot
An AI assistant embedded in an existing workflow to augment a human worker.
Read definition →AI Cost (Per-Token Pricing)
You pay per million input and output tokens. Output is 3-5× more expensive than input.
Read definition →AI Governance
The policies, processes, and roles that manage responsible AI use in an organization.
Read definition →AI Readiness
How prepared an organization is to deploy and benefit from AI — technically and culturally.
Read definition →AI Search (Retrieval + Generation)
A search experience that returns generated answers grounded in retrieved sources, not just a list of links.
Read definition →AI Strategy
A plan for where and how AI creates value in your business — and how to get there.
Read definition →Alignment
Ensuring AI behavior actually matches what humans intend — technically and ethically.
Read definition →Attention Mechanism
How transformers decide which tokens to focus on when generating each output token.
Read definition →
B
C
Chain-of-Thought (CoT)
Asking the model to reason step by step before answering.
Read definition →Change Management (AI Adoption)
The human side of AI rollout — getting teams to actually use and trust new AI tools.
Read definition →Chunking
Splitting documents into smaller pieces before embedding and indexing them.
Read definition →Claude Sonnet (Anthropic)
Anthropic's primary workhorse model — strong writing, long context, and reliable tool use.
Read definition →Computer Use
AI agents that control a computer — browser, desktop, or OS — to complete tasks.
Read definition →Context Window
How much text (in tokens) you can feed the model in one request.
Read definition →
D
Diffusion Model
The architecture behind image generators like DALL-E, Midjourney, and Stable Diffusion.
Read definition →Distillation
Training a smaller, cheaper model to mimic a larger one's outputs.
Read definition →Document Parsing
Extracting clean, structured text from PDFs, Word files, HTML, and other formats.
Read definition →
E
Embedding Model
A model whose only job is to turn text into vectors for semantic search.
Read definition →Embeddings
Numerical representations of text so a computer can measure meaning by distance.
Read definition →Evals
A test suite for AI features. Required before anything goes to production.
Read definition →
F
Faithfulness
Whether a summary or answer accurately reflects the source without distorting it.
Read definition →Few-Shot Learning
Showing the model 2-5 examples of the task in the prompt so it learns the pattern.
Read definition →Fine-Tuning
Continuing to train a base model on your own examples to specialize its behavior.
Read definition →Function Calling
Same idea as tool use — the model returns a structured call for your code to execute.
Read definition →
G
Gemini (Google)
Google's frontier LLM family — notable for its 2M-token context window and Google ecosystem integration.
Read definition →GPT-4o
OpenAI's flagship multimodal model — fast, cheap relative to predecessors, and supports vision and voice.
Read definition →Groundedness
Whether a model's answer is supported by the provided source documents.
Read definition →Guardrails
Programmatic checks that catch unsafe or off-spec model output.
Read definition →
H
I
In-Context Learning (ICL)
How models adapt to new tasks from examples in the prompt, with no weight updates.
Read definition →Inference
Running a trained model to generate output. The expensive part of AI in production.
Read definition →Instruct Model
A base model fine-tuned to follow instructions — the "chat" version you actually use.
Read definition →
J
K
L
Latency
How long a model takes to respond. Measured as time-to-first-token and total time.
Read definition →Llama (Meta)
Meta's open-source LLM family — the leading choice for self-hosted and fine-tuned deployments.
Read definition →LLM (Large Language Model)
A model trained on huge amounts of text to predict the next token.
Read definition →LLM-as-Judge
Using an LLM to evaluate the quality of another LLM's output.
Read definition →LLMOps
The operational practice of running LLM-based systems in production — monitoring, versioning, and iteration.
Read definition →LoRA (Low-Rank Adaptation)
A lightweight way to fine-tune by training small adapter weights instead of the whole model.
Read definition →
M
MCP (Model Context Protocol)
An open protocol for giving any model access to your tools and data.
Read definition →Metadata Filtering
Narrowing retrieval to specific document subsets using attributes like date, department, or type.
Read definition →Mixture of Experts (MoE)
An architecture where only part of the model activates per token.
Read definition →Model Routing
Sending requests to different models based on complexity, cost, or content type.
Read definition →Multi-Agent System
Multiple AI agents working together, each with a specialized role.
Read definition →Multimodal Model
A model that handles text plus images, audio, or video in one request.
Read definition →
O
P
PII (Personally Identifiable Information)
Data that identifies a person — names, emails, phone numbers, addresses, SSNs.
Read definition →Planning (in AI Agents)
How an agent breaks a complex goal into a sequence of steps before acting.
Read definition →Prompt Caching
Reusing cached computation for repeated prompt prefixes — cuts cost 80-90%.
Read definition →Prompt Engineering
The craft of writing instructions that get reliable, useful output from a model.
Read definition →Prompt Injection
When an attacker hides instructions in input that the model treats as commands.
Read definition →Proof of Concept (AI POC)
A small, time-boxed build to test whether an AI approach solves a real problem.
Read definition →
Q
R
RAG (Retrieval-Augmented Generation)
Look up relevant documents first, then ask the model to answer using them.
Read definition →Rate Limiting (AI APIs)
The caps providers set on requests and tokens per minute — and how to work around them.
Read definition →Red Teaming
Adversarially testing an AI system to find ways it fails or can be misused.
Read definition →Reranker
A second-pass model that re-orders retrieval results by true relevance.
Read definition →RLHF (Reinforcement Learning from Human Feedback)
The training technique that turns a raw LLM into a helpful, safe assistant.
Read definition →ROI of AI
Measuring the financial return from AI investments — time saved, errors reduced, revenue added.
Read definition →
S
Semantic Search
Finding documents by meaning, not just matching keywords.
Read definition →Speculative Decoding
A technique that uses a small draft model to speed up a large model's generation.
Read definition →Streaming
Receiving model output token-by-token as it generates, not waiting for the full response.
Read definition →Structured Output
Constraining a model to respond in a specific format — JSON, XML, or a defined schema.
Read definition →Synthetic Data
Training or eval data generated by a model rather than collected from humans.
Read definition →System Prompt
The persistent instructions a model sees before any user message.
Read definition →
T
Temperature
A dial that controls how random or focused a model's output is.
Read definition →Throughput
How many requests or tokens a system can serve per second.
Read definition →Tokens
The chunks of text models count and bill by — usually 3-4 characters each.
Read definition →Tool Use (Function Calling)
Letting the model decide which API to call and what arguments to pass.
Read definition →Top-p (Nucleus Sampling)
A sampling dial that picks from the smallest set of tokens summing to probability p.
Read definition →Tracing (AI / LLM)
Recording the full execution path of an AI request — every LLM call, tool call, and intermediate step.
Read definition →Transformer
The neural network architecture behind every major LLM — attention over sequences.
Read definition →
V
W
Z
Looking for something we haven't covered?
We'll add it. Just ask.
Our glossary is built from the questions clients actually ask us. Suggest a term and we'll write a definition that doesn't hide behind jargon.