Aquin LogoAquinLabs
Login

Checkpoint SAE diff, temp train & align

Post-training interpretability on real fine-tuned checkpoints. Compare base vs checkpoint activations through the public SAE (aquin pull sae), then align decoder directions to the public dictionary. For activation capture and temp SAE training, see SAE training (/docs/sae-training). Distinct from aquin simulate (forecast) and Training watch (metrics only). Requires LLM mode, GPU, pulled public SAE, and aquin connect --name for web mirror cards.

Prerequisiteaquin connect --device my-gpu --name my-run && aquin load --model llama-3.2-1b && aquin pull sae llama-3.2-1b-l8

3 commands

aquin sae diff

agent tool: run_sae_diff

Load the catalog base model and a fine-tuned checkpoint, run the same prompts through the public SAE, and report per-feature activation deltas (changed count, mean/max |Δ|, top features). Syncs a saeDiff card to the web orchestrator (collapsible sync row + rich panel). Writes optional JSON with --output.

FlagDescription
--model*Catalog model slug (e.g. llama-3.2-1b).
--checkpoint*Path to merged .pt checkpoint (e.g. aquin_run/checkpoints/checkpoint.pt).
--promptsJSON array or JSONL of probe strings / {instruction, response} rows.
--layerSAE layer (default: from model config).
--saeCustom SAE weights path instead of pulled public SAE.
--nameLabel for checkpoint in output and web card (default: checkpoint filename).
--outputWrite full JSON payload to disk.
example

Checkpoint format: { step, state_dict } from run.checkpoint() or fixtures/e2e/scripts/train_lora_e2e.py. Starts local engine server on localhost:8002 like other GPU tools.

aquin sae align

agent tool: run_sae_align

Hungarian match of decoder directions between two SAE checkpoints (typically public vs temp-trained). Prints mean cosine and weakest/strongest pairs. Syncs saeAlign card to web. Optional alignment map JSON with --output.

FlagDescription
--sae-a*First SAE .pt (e.g. public ~/.aquin/sae/<model>/sae_layer8.pt).
--sae-b*Second SAE .pt (e.g. temp ~/.aquin/sae/user/.../sae_layer8.pt).
--outputWrite full pairs map JSON.
--max-featuresCap features aligned (default: all).
example

aquin simulate (saeDiff)

agent tool: run_simulation

At the end of aquin simulate, the pipeline runs an SAE diff between base and the NTK-linearized synthetic checkpoint. Stream logs [simulate] SAE diff: … with nChanged / meanAbsDelta. See Simulation (LLM) for full simulate flags.

example

Synthetic checkpoint — not the same as sae diff on a real LoRA checkpoint. See /docs/simulation/llm.

Typical workflow

After fine-tuning (your trainer + run.checkpoint(), or the E2E fixture train_lora_e2e.py), capture probes → diff → temp train → align. Activation capture and sae train live under SAE training.

post-training SAE pipeline

Web mirror

Each command pushes tool.start / tool.result to your session. Cards:

  • sae diff — changed features, top deltas, base vs FT table
  • sae train — layer, quick/full, output path
  • sae align — mean cosine, weakest/strongest decoder matches

Collapsible sync row shows Invoked / Completed; full JSON is slimmed in sync payload (details live on the card).

vs simulate & watch

aquin sae diffaquin simulateaquin watch
CheckpointReal merged .pt from trainingSynthetic NTK-linearized weightsNo weights — metrics JSONL only
GPURequiredRequiredNot required
Web cardsaeDiffSimulation + saeDiff in streamtraining.watch.*