Evals: Embedding
Behavioral probes for embedding models. The primary eval is custom Q&A with cosine-similarity scoring instead of keyword overlap. Embedding-specific retrieval evals live under Inspection (embed-retrieval, embed-sae-faithfulness). Requires embedding mode.
Prerequisiteaquin load --model gte-small
1 command
aquin eval
agent tool: run_custom_eval
Custom eval for embedding models: encodes each prompt and reference answer, scores by cosine similarity instead of keyword overlap. Use for semantic match tasks (paraphrase detection, retrieval-style Q&A).
| Flag | Description |
|---|---|
| --name* | Eval name. |
| --prompts* | JSON array of query strings. |
| --reference_answers* | JSON array of target strings. |
| --threshold | Cosine similarity pass threshold (default: 0.5). |
example
Same command as LLM eval; scoring backend switches automatically based on loaded model type.
