Aquin is the research company using interpretability to

designingintelligence

Full-stack AI observability with tracing training data provenance, inspecting model weights to find where specific behaviors and knowledge are stored, and editing them directly without fine-tuning or retraining.

tokposembL0L1L2L3L4L5L6L723%L849%L965%L1052%L1140%L1221%L13L14L15outParis
high impactmediumlowminimal

attribution

Trace every response token back to the prompt tokens that caused it. See exactly how signal flows through layers to produce each word.

01prompt highlighting
prompt
WhatisthecapitalofFrance?
response
ThecapitalofFranceisParis.
causal weight
low to high
causal mediation analysis
02network digraph

Causal graph of how signal flows through layers. Thicker edges carry more weight. See which layers matter for any output.

inputL4L8L12L14L16output
edge weight = causal signal

logit lens

See what the model thinks at every layer as it builds toward a final answer. Watch a vague token sharpen into a confident prediction.

prompt: "what is the capital of France?"
layer 1
the12%
layer 4
capital34%
layer 8
city58%
layer 14
Paris81%
layer 16
Paris97%
token prediction per layer

diff

Connect any two checkpoints and see exactly what changed, which weights shifted, which dataset caused it, and whether the data behind it is clean.

base
delta
fine-tuned
weight shifts + data origin
L14 · MLP W_out+0.42
factual recall strengthenedcaused by: wikipedia_en_2023.parquet
L8 · attn head 3-0.31
hedging language reducedcaused by: reddit_comments_filtered.jsonl
L6 · MLP W_in+0.19
geographic association addedcaused by: translated_pile_fr.jsonl
L11 · attn head 7-0.09
refusal circuit weakenedcaused by: gpt4_synthetic_qa.jsonl
weight shift traced to dataset
dataset provenance

Every dataset that contributed to the fine-tune, with license, jurisdiction, and status. Flagged sources link to the weights they affected.

wikipedia_en_2023.parquetCC BY-SA 4.0
clean
reddit_comments_filtered.jsonlunknown
review
gpt4_synthetic_qa.jsonlOpenAI ToS
flagged
translated_pile_fr.jsonlderived
flagged
liability chain
reddit sourceparaphrase passfr translationsynthetic augment
source · license · jurisdiction · opt-out

data provenance

Inspect the full training data record. Every source, under what license, from what jurisdiction, and whether synthetic data or translations are in the chain.

full training data record
sourcelicensejurisdictionopt-outsyntheticstatus
wikipedia_en_2023.parquetCC BY-SA 4.0Globalnonoclean
reddit_comments_filtered.jsonlunknownUSpartialnoreview
gpt4_synthetic_qa.jsonlOpenAI ToSUSn/ayesflagged
pubmed_abstracts_2022.csvNLM ToSUSnonoclean
translated_pile_fr.jsonlderivedEUunknownnoflagged
source url · scrape date · license · jurisdiction · opt-out
liability chain

One flagged source propagates liability through every derived dataset. Paraphrases, translations, and synthetic augmentations all inherit the risk of their origin.

source datasetreddit_comments_filtered.jsonl
paraphrase passgpt3.5-turbo · 2023-09
translationHelsinki-NLP · fr, de, es
synthetic augmentgpt4-turbo · 2024-01
recursive liability traced

benchmarks

Three suites built into Aquin. Run them on any checkpoint, edit, or quantization pass. Know immediately whether a change made the model better or worse.

EditBench

edit fidelity

Surgical precision. Does the edit change only what you intended?

edit success94%
side-effect score97%
generalisation81%

FineTuneDiff

checkpoint diff

What actually changed between base and fine-tuned at the weight level.

weight shift coverage88%
behaviour correlation91%
drift detection76%

InterpScore

interpretability

How cleanly do features map to human-readable concepts?

monosemanticity73%
concept linearity68%
label confidence85%
run history
runEditBenchFineTuneDiffInterpScoredelta
llama-3.2-1b · base716459baseline
llama-3.2-1b · sft-v1787963+9 avg
llama-3.2-1b · sft-v2828370+5 avg
llama-3.2-1b · int4-quant747161-9 avg
llama-3.2-1b · rome-edit-1948873+14 avg

human readability

Model internals are not inherently unreadable. Aquin translates activations, weights, and layer states into language an engineer can reason about.

10neuron translator
L12 · N0470.847 · MLP W_out
fires for capital cities94%
L8 · N2130.612 · attn head 3
tracks geographic references87%
L14 · N0910.391 · MLP W_in
suppresses hedging language79%
L6 · N5020.229 · attn head 7
detects question intent71%
activation to language
10internals vs labels
weightrawlabel
L14 · MLP W_out [2048,11]0.847capital city associations
L8 · attn head 3 · V-0.312geographic suppression
L12 · MLP W_in [512,2048]0.601factual recall trigger
L6 · attn head 7 · Q0.229question parsing
raw weights mapped to behaviour

factual checks

Most models ship as black boxes. You have no way to know what they learned to suppress, amplify, or distort. Aquin surfaces it.

12bias detection

Trace which features consistently skew outputs along political, demographic, or cultural lines. See the weight, not just the symptom.

leftpolitical leanright
negativesentiment skewpositive
group Ademographicgroup B
traced to layer activations
13censor audit

Find what the model refuses to say and why. Identify suppression circuits. See whether refusals are weight-level decisions or surface-level RLHF patches.

medical dosagesuppressed
political figuressoftened
competitor namessuppressed
historical eventsunfiltered
weight-level origin mapped

aipedia

A living, community-indexed knowledge base of model features. Every behaviour, every circuit, every weight pattern. Searchable. Citable. Growing.

search
featuremodellayercircuitconfidence
capital city recallLlama 3.2 1BL14MLP W_out [2048,11]94%
hedging languageLlama 3.2 1BL8attn head 3 · V87%
geographic associationMistral 7BL11MLP W_in [512,2048]81%
refusal circuitGemma 2BL9attn head 7 · Q76%
capital citieshedgingrefusal circuitsgeographicRLHF artifacts

weight editing

Locate the exact MLP layer encoding a fact. Overwrite it with a rank-one update. Validate with three independent checks. No retraining needed.

causal trace · 16 layers · Pythia 2.8Btarget L12
L0L1L2L3L4L5L6L7L8L9L10L11L120.904L13L14L15

L12 carries 90.4% of causal recovery signal. red rings = above 40% threshold.

pipeline
baseline
trace
update
validate
commit
rank-one update
W_out · L12 · delta applied
validation3/3 pass
Paraphrase
45%
Behavioral KL
0.0035
Fingerprint
99%
STATS
layerL12
before5%
after87%
delta+82pp
probability shift
5%
before
87%
after
"Berlin"
subject The Eiffel Tower
relation: is located in
target: Berlin
benchmarks
9 pass4 fail
69%pass rate
EditBench
81%
EditGeneralization
81%
RippleBench
67%
FineTuneDiff
65%
SeqCollapse
65%

Not sure if Aquin is right for you?

All Systems StatusPoliciesResearch© 2026 Aquin. All rights reserved.

Aquin