Checkpoint SAE diff, temp train & align
Post-training interpretability on real fine-tuned checkpoints. Compare base vs checkpoint activations through the public SAE (aquin pull sae), then align decoder directions to the public dictionary. For activation capture and temp SAE training, see SAE training (/docs/sae-training). Distinct from aquin simulate (forecast) and Training watch (metrics only). Requires LLM mode, GPU, pulled public SAE, and aquin connect --name for web mirror cards.
3 commands
aquin sae diff
agent tool: run_sae_diff
Load the catalog base model and a fine-tuned checkpoint, run the same prompts through the public SAE, and report per-feature activation deltas (changed count, mean/max |Δ|, top features). Syncs a saeDiff card to the web orchestrator (collapsible sync row + rich panel). Writes optional JSON with --output.
| Flag | Description |
|---|---|
| --model* | Catalog model slug (e.g. llama-3.2-1b). |
| --checkpoint* | Path to merged .pt checkpoint (e.g. aquin_run/checkpoints/checkpoint.pt). |
| --prompts | JSON array or JSONL of probe strings / {instruction, response} rows. |
| --layer | SAE layer (default: from model config). |
| --sae | Custom SAE weights path instead of pulled public SAE. |
| --name | Label for checkpoint in output and web card (default: checkpoint filename). |
| --output | Write full JSON payload to disk. |
Checkpoint format: { step, state_dict } from run.checkpoint() or fixtures/e2e/scripts/train_lora_e2e.py. Starts local engine server on localhost:8002 like other GPU tools.
aquin sae align
agent tool: run_sae_align
Hungarian match of decoder directions between two SAE checkpoints (typically public vs temp-trained). Prints mean cosine and weakest/strongest pairs. Syncs saeAlign card to web. Optional alignment map JSON with --output.
| Flag | Description |
|---|---|
| --sae-a* | First SAE .pt (e.g. public ~/.aquin/sae/<model>/sae_layer8.pt). |
| --sae-b* | Second SAE .pt (e.g. temp ~/.aquin/sae/user/.../sae_layer8.pt). |
| --output | Write full pairs map JSON. |
| --max-features | Cap features aligned (default: all). |
aquin simulate (saeDiff)
agent tool: run_simulation
At the end of aquin simulate, the pipeline runs an SAE diff between base and the NTK-linearized synthetic checkpoint. Stream logs [simulate] SAE diff: … with nChanged / meanAbsDelta. See Simulation (LLM) for full simulate flags.
Synthetic checkpoint — not the same as sae diff on a real LoRA checkpoint. See /docs/simulation/llm.
Typical workflow
After fine-tuning (your trainer + run.checkpoint(), or the E2E fixture train_lora_e2e.py), capture probes → diff → temp train → align. Activation capture and sae train live under SAE training.
Web mirror
Each command pushes tool.start / tool.result to your session. Cards:
- sae diff — changed features, top deltas, base vs FT table
- sae train — layer, quick/full, output path
- sae align — mean cosine, weakest/strongest decoder matches
Collapsible sync row shows Invoked / Completed; full JSON is slimmed in sync payload (details live on the card).
vs simulate & watch
| aquin sae diff | aquin simulate | aquin watch | |
|---|---|---|---|
| Checkpoint | Real merged .pt from training | Synthetic NTK-linearized weights | No weights — metrics JSONL only |
| GPU | Required | Required | Not required |
| Web card | saeDiff | Simulation + saeDiff in stream | training.watch.* |
