Aquin LogoAquinLabs
Login

SAE training

Collect activations from labeled probe sets (LLM resid_post or embedding hidden states) and train temporary sparse autoencoders on checkpoint weights. Works in both LLM and embedding mode after aquin load. Use capture-activations to persist vectors with metadata; syncs an activationCapture card to the web orchestrator.

Prerequisiteaquin connect --device my-gpu --name my-run && aquin load --model llama-3.2-1b

2 commands

aquin capture-activations

agent tool: run_capture_activations

Run a batch of probes through the loaded session model (from aquin load), capture activations per layer, and write a manifest + tensor shards. Optional --checkpoint patches LLM weights for capture.

FlagDescription
--output*Output directory (manifest.json + layers/layer_<N>.pt).
--promptsOptional JSON/JSONL probes. Omit to auto-generate with --count and --topic.
--countNumber of probes when auto-generating (default: 6, max: 64).
--topicTheme for generated probes (default: general knowledge…).
--layersComma-separated layer indices or all (default: all).
--checkpointFine-tuned .pt checkpoint (omit for base model).
--positionPool tokens: last (default) or mean.
--encode-saeAlso write SAE feature vectors for --sae-layer (requires pulled public SAE).
--sae-layerLayer for --encode-sae (default: model SAE layer).
--nameCapture label in manifest (default: checkpoint stem or base).
--output-jsonPrint result JSON to stdout.
example

LLM: TransformerLens resid_post. Embedding: hidden_states per layer. Web card: activationCapture (probe count, layers, metadata chips). Optional sae/sae_layer_N.pt with --encode-sae after aquin pull sae.

aquin sae train

agent tool: run_sae_train

Train a temporary SAE on activations streamed from a corpus or checkpoint weights. Uses internal collect_activations (chunked .pt) then trains decoder. Use --quick for smoke tests. Output: ~/.aquin/sae/user/<model>/<name>/sae_layer<N>.pt.

FlagDescription
--model*Catalog model slug.
--layer*Hook layer index.
--checkpointFine-tuned checkpoint .pt (omit for base-model SAE).
--quickShorter training run (~100k tokens).
--corpusJSON/JSONL text corpus (default: streamed OpenWebText).
--nameTag for output directory under ~/.aquin/sae/user/.
--outputExplicit output .pt path.
example

Typical flow: capture-activations on labeled probes → sae train on full corpus → sae align vs public SAE. See Checkpoint SAE (/docs/checkpoint-sae) for diff and align.

Probe format

JSONL rows can carry metadata preserved in manifest.json for honest/deceptive, language, or cohort labels:

capture_probes.jsonl

Typical workflow

capture → train → align

Post-training diff and decoder alignment: Checkpoint SAE. External training metrics only: Training watch.