Aquin LogoAquinLabs
Login

Inspection (SAE): Embedding

Sparse autoencoder tools for embedding encoders. Decomposes final-layer activations into sparse features, compares texts at the feature level, traces circuits, and measures dictionary health. Requires embedding mode plus a pulled embedding SAE.

Prerequisiteaquin load --model gte-small && aquin pull sae gte-small-l11

11 commands

aquin embed-sae-features

agent tool: run_embed_sae_features

Runs text through the encoder and SAE encoder, returns the top-k active sparse features with activation strengths. Entry point for understanding what concepts the embedding contains.

FlagDescription
--text*Input text.
--top_kNumber of features to return (default: 10).
example

aquin embed-sae-contrastive

agent tool: run_embed_sae_contrastive

Compares two texts at the SAE feature level. Returns features with the largest activation delta: what the encoder represents differently between the two inputs.

FlagDescription
--text_a*First text.
--text_b*Second text.
--top_kTop diverging features to report.
--corpusOptional corpus for feature labeling.
example

aquin embed-sae-interp

agent tool: run_embed_sae_interp_score

Scores the interpretability of one SAE feature over a corpus: how consistently it fires on semantically related vs unrelated texts.

FlagDescription
--feature_idx*Feature index.
--corpus*JSON array of corpus strings.
--n_samplesSamples per scoring pass.
example

aquin embed-sae-browser

agent tool: run_embed_sae_browser

Browses the most frequently active SAE features across a corpus. Surfaces the dominant concepts the encoder uses for that text collection.

FlagDescription
--corpus*JSON array of strings.
--top_n_featuresFeatures to list.
example

aquin embed-sae-graph

agent tool: run_embed_sae_network_graph

Builds a co-activation graph: nodes are SAE features, edges connect features that fire together above a threshold. Reveals feature communities in the dictionary.

FlagDescription
--corpus*JSON array of strings.
--thresholdCo-activation threshold.
--top_n_featuresLimit graph to top-N active features.
example

aquin embed-sae-circuit

agent tool: run_embed_sae_circuit

Traces how one target SAE feature's activation builds up layer-by-layer through the encoder. Shows where in the stack the concept first appears and how it strengthens.

FlagDescription
--text*Input text.
--target_feature_idx*Feature to trace.
example

aquin embed-sae-steer

agent tool: run_embed_sae_steer

Boosts or suppresses one SAE feature activation and measures cosine shift in the output embedding. Optionally re-ranks a corpus to show retrieval impact.

FlagDescription
--text*Input text.
--feature_idx*Feature to steer.
--delta*Activation delta (positive = boost, negative = suppress).
--corpusCorpus for retrieval re-ranking after steer.
--top_k_retrievalTop-k for retrieval comparison.
example

aquin embed-sae-absorption

agent tool: run_embed_sae_absorption

Scans for feature absorption pairs (one feature's decoder absorbed into another) and near-duplicate decoder directions. Flags dictionary redundancy.

FlagDescription
--corpus*JSON array of strings.
--top_nTop features to scan.
example

aquin embed-sae-polysemy

agent tool: run_embed_sae_polysemy

Finds features that fire strongly on semantically unrelated sentences: polysemous or entangled features that hurt interpretability.

FlagDescription
--corpus*JSON array of strings.
--top_nTop features to analyze.
example

aquin embed-sae-faithfulness

agent tool: run_embed_sae_retrieval_faithfulness

Ablates SAE features one at a time and measures NDCG drop on a query set. Identifies which features are load-bearing for retrieval quality.

FlagDescription
--queries*JSON array of query strings.
--corpus*JSON array of document strings.
--top_kRetrieval top-k.
--n_features_to_testHow many top features to ablate.
example

aquin embed-space-decomp

agent tool: run_embed_space_decomposition

Decomposes a set of texts into their dominant shared SAE features: which concepts span the whole collection vs which are text-specific.

FlagDescription
--texts*JSON array of strings.
--top_nDominant features to report.
example