Aquin LogoAquinLabs
Login

Training watch

Passive observer for external training runs — not aquin simulate and not checkpoint SAE tools. Your trainer (PyTorch loop, custom script, Aquin SDK, etc.) writes metrics as JSONL; aquin watch ingests those lines, stores them locally, and mirrors loss/LR/grad charts to the web orchestrator panel. Aquin does not run optimizer steps or store merged weights here — for post-training SAE diff on checkpoints, see Checkpoint SAE (/docs/checkpoint-sae). Registry lives at ~/.aquin/watch/<run_id>/ (manifest.json + events.jsonl). Simulation runs stay in ~/.aquin/runs/ and appear only under aquin list simulation.

Prerequisiteaquin login · aquin session start --id <id> --model <model-id> (web mirror uses the active session)

4 commands

aquin watch list

List local watch runs: run id, status, name, base model, event count.

example

aquin watch init

Register a watch run before ingesting metrics. Default run name is the active session name. Writes manifest.json and a start event to events.jsonl under ~/.aquin/watch/<run_id>/.

FlagDescription
--nameOverride display name (default: active session name, or watch-run).
--modelBase model slug shown on web chart (e.g. llama-3.2-1b).
--quantQuantization label: fp16, int8, q4, none (default: none).
--modeRun mode label (default: external).
example

aquin watch ingest

Parse a metrics JSONL file and append observations to a watch run. Batch mode (default) reads the file once and exits. With --follow, keeps tailing the file as new lines are appended (live training). Each synced ingest pushes training.watch.start and opens a new chart card on the web mirror. Web sync uses the active session from aquin session start.

FlagDescription
--runExisting watch run id (from init or list).
--nameWith --file and no --run: override new run name (default: active session name).
--filePath to metrics JSONL. Omit to read stdin.
--followTail the file; ingest new lines as they appear.
--finishMark run stopped when ingest ends.
--map src=dstRename a metric column (e.g. --map train_loss=loss).
--step-fieldStep column name (default: step, global_step, …).
--offsetSkip first N lines (resume ingest).
--auto-stepAssign steps 0,1,2… when rows have no step field.
example

Fixture: fixtures/e2e/watch/metrics.jsonl. One JSON object per line. Scalar keys (loss, learning_rate, grad_norm, epoch) become chart channels. Special rows: {"type":"signal",…} and {"status":"stopped"}.

aquin watch <run_id>

Replay or live-tail the local events.jsonl for a run. Default follows new events (Ctrl+C to detach). --no-follow replays once and exits. Syncs to the active session from aquin session start.

FlagDescription
--no-followReplay stored events once; do not wait for new lines.
--output jsonPrint raw JSON events.
example

Quick start

Watch does not run GPU inspection on your metrics file, but an active session still needs a model at start. Use a small catalog model (e.g. pythia-70m) if you only need the web mirror. Run aquin session start --id <id> --model <model-id> once so ingest mirrors charts to your session tab in the web app.

batch ingest + web mirror

Live training (--follow)

Point ingest at a metrics file your trainer appends to. Each new JSON line is picked up automatically and synced to the web mirror when a session is active.

two terminals

Metrics JSONL format

One JSON object per line. Step comes from global_step, step, or --step-field. Numeric scalars become chart channels. Use --map train_loss=loss to rename trainer column names.

metrics.jsonl

Watch vs simulate

Training watchaquin simulate
PurposeObserve real external training metricsForecast training without weight updates
Storage~/.aquin/watch/<run_id>/~/.aquin/runs/<run_id>/
List commandaquin watch listaquin list simulation
GPUNot requiredRequired (model load)
Web mirrorLoss/LR/grad chart cards (training.watch.*)Simulation result cards (tool.result)

For SAE diff / temp train / align on real checkpoints after training, see Checkpoint SAE. Watch does not store merged weights or run GPU SAE tools.

Session sync uses the active session from aquin session start --id <id> --model <model-id>. Resume later with aquin session switch <session>. Delete with aquin session delete <session>.