Aquin is the research company using interpretability to

designingintelligence

Full-stack AI observability with tracing training data provenance, inspecting model weights to find where specific behaviors and knowledge are stored, and editing them directly without fine-tuning or retraining.

tokposembL0L1L2L3L4L5L6L723%L849%L965%L1052%L1140%L1221%L13L14L15outParis
high impactmediumlowminimal

Backed by

Emergent Ventures
Emergent VenturesGrant Winner
Founders Inc
Founders IncIncubated
The Residency
The ResidencyIncubated
Google for Startups
Google for StartupsStartup Credits
Anthropic
AnthropicStartup Program
AI Grants India
AI Grants IndiaGrant
01

attribution

Every response token traces back to the prompt tokens that caused it. Watch the signal flow through each layer until the answer locks in.

prompt · causal weights
WhatisthecapitalofFrance?
response · inherited signal
ThecapitalofFranceisParis.
causal weight
low → high
causal mediation analysis
logit lens · prediction per layer

At L1 the model guesses "the". By L8 it's converging on "city". At L16, Paris is locked at 97% — the exact moment the answer forms.

layer 1
the12%
layer 4
capital34%
layer 8
city58%
layer 14
Paris81%
layer 16
Paris97%
edge weight = causal signalread the methodology
02

diff

Connect any two checkpoints. See which weights shifted, what behaviour each shift caused, and which training dataset row is responsible.

base
delta
fine-tuned
weight shifts + data origin
L14 · MLP W_out+0.42
factual recall strengthenedwikipedia_en_2023.parquet
L8 · attn head 3-0.31
hedging language reducedreddit_comments_filtered.jsonl
L6 · MLP W_in+0.19
geographic association addedtranslated_pile_fr.jsonl
L11 · attn head 7-0.09
refusal circuit weakenedgpt4_synthetic_qa.jsonl
weight shift traced to dataset
dataset provenance
sourcelicensejurisdictionopt-outsyntheticstatus
wikipedia_en_2023.parquetCC BY-SA 4.0Globalnonoclean
reddit_comments_filtered.jsonlunknownUSpartialnoreview
gpt4_synthetic_qa.jsonlOpenAI ToSUSn/ayesflagged
pubmed_abstracts_2022.csvNLM ToSUSnonoclean
translated_pile_fr.jsonlderivedEUunknownnoflagged
source · license · jurisdiction · opt-out
03

human readability

Model internals are not inherently unreadable. Every activation, weight, and layer state translated into language — with examples showing exactly when each feature fires.

detected features
fires for capital cities
L12 · N047 · 0.847 · MLP W_out
94% confidence
fires on
The Eiffel Tower is in Paris
London is the capital of England
silent on
The weather is cloudy today
She enjoyed the book
internals → labels
weightrawlabel
L14 · MLP W_out [2048,11]0.847capital city associations
L8 · attn head 3 · V-0.312geographic suppression
L12 · MLP W_in [512,2048]0.601factual recall trigger
L6 · attn head 7 · Q0.229question parsing
raw weights mapped to behaviour
04

factual checks

Most models ship as black boxes. You have no way to know what they learned to suppress, amplify, or distort. Aquin surfaces it.

bias detection

Trace which features consistently skew outputs along political, demographic, or cultural lines. See the weight, not just the symptom.

leftpolitical leanright
negativesentiment skewpositive
group Ademographicgroup B
traced to layer activations
censor audit

Find what the model refuses to say and why. Identify suppression circuits. See whether refusals are weight-level decisions or surface-level RLHF patches.

medical dosagesuppressed
political figuressoftened
competitor namessuppressed
historical eventsunfiltered
weight-level origin mapped
05

evals

Three suites built in. Run them on any checkpoint, edit, or quantization pass. Every run logged, every delta tracked.

EditBenchedit fidelity

Does the edit change only what you intended?

edit success94%
side-effect score97%
generalisation81%
FineTuneDiffcheckpoint diff

What actually changed between base and fine-tuned at the weight level.

weight shift coverage88%
behaviour correlation91%
drift detection76%
InterpScoreinterpretability

How cleanly do features map to human-readable concepts?

monosemanticity73%
concept linearity68%
label confidence85%
run history
runEditBenchFineTuneDiffInterpScoredelta
llama-3.2-1b · base716459baseline
llama-3.2-1b · sft-v1787963+9 avg
llama-3.2-1b · sft-v2828370+5 avg
llama-3.2-1b · int4-quant747161−9 avg
llama-3.2-1b · rome-edit-1948873+14 avg
06

agentic system

An autonomous interpretability copilot. Tell it what you want to understand. It runs the full pipeline, chains tools, and explains what the UI is showing — in real time.

Inspect the model on 'The Eiffel Tower is located in' then explain the causal trace
L12 carries 90.4% of causal recovery signal — the model's geographic fact store lives almost entirely in that single layer. The trace shows a sharp peak at L12, with attn heads at L8 contributing secondary signal for location-type disambiguation.
Ask me to inspect, trace, or benchmark…
tool chain
run_full_inspection
Sends a prompt through the model and runs the complete pipeline — features, trace, logit lens, fact/bias/censor.
run_benchmarks_on_top_feature
Picks the highest-activation feature from the last run and scores it on InterpScore, Purity, and MUI.
run_fact_check_only
Lightweight — runs just fact-check, bias, and censor audit on the last inspection response.
autonomous · chains tools · explains in real time
07

aipedia

A living, community-indexed knowledge base of model features. Every behaviour, every circuit, every weight pattern — searchable and citable.

search features, behaviours, circuits...
search
featuremodellayercircuitconfidence
capital city recallLlama 3.2 1BL14MLP W_out [2048,11]94%
hedging languageLlama 3.2 1BL8attn head 3 · V87%
geographic associationMistral 7BL11MLP W_in [512,2048]81%
refusal circuitGemma 2BL9attn head 7 · Q76%
capital citieshedgingrefusal circuitsgeographicRLHF artifacts
08

weight editing

Locate the exact MLP layer encoding a fact. Overwrite it with a rank-one update. No retraining. We're building the editor — this is the live experiment.

causal trace · 16 layers · Pythia 2.8Btarget L12
L0L1L2L3L4L5L6L7L8L9L10L11L120.904L13L14L15

L12 carries 90.4% of causal recovery signal. red rings = above 40% threshold.

5%
before
87%
after
pipeline
baseline
trace
update
validate
commit
rank-one update
69%pass rate
9 pass4 fail
what we're learning
RippleBench67%
high-confidence overwrites disturb nearby facts
SeqCollapse65%
sequential edits destabilise earlier writes
SeqRetention45%
durability degrades across edit chains
LocalitySens36%
cross-domain isolation still an open problem
this is an experimental study · we're building in publicfollow the work

Not sure if Aquin is right for you?

All Systems StatusPoliciesResearch© 2026 Aquin. All rights reserved.

Aquin