Evals
A test suite for AI features. Required before anything goes to production.
Read definition →Hallucination
When a model confidently states something that isn't true.
Read definition →Synthetic Data
Training or eval data generated by a model rather than collected from humans.
Read definition →Benchmark
A standardized test set used to compare model performance across providers.
Read definition →LLM-as-Judge
Using an LLM to evaluate the quality of another LLM's output.
Read definition →Groundedness
Whether a model's answer is supported by the provided source documents.
Read definition →Faithfulness
Whether a summary or answer accurately reflects the source without distorting it.
Read definition →