Start a project

GlossaryEvaluation

Evaluation.

Measuring quality — evals, hallucination, and how to know whether a system actually works.

7 terms

Evals
A test suite for AI features. Required before anything goes to production.
Read definition
Hallucination
When a model confidently states something that isn't true.
Read definition
Synthetic Data
Training or eval data generated by a model rather than collected from humans.
Read definition
Benchmark
A standardized test set used to compare model performance across providers.
Read definition
LLM-as-Judge
Using an LLM to evaluate the quality of another LLM's output.
Read definition
Groundedness
Whether a model's answer is supported by the provided source documents.
Read definition
Faithfulness
Whether a summary or answer accurately reflects the source without distorting it.
Read definition

Other categories

Models23 Agents & Tools13 RAG & Retrieval14 Infrastructure13 Safety & Trust7 Enterprise AI7