Training watch
Passive observer for external training runs — not aquin simulate and not checkpoint SAE tools. Your trainer (PyTorch loop, custom script, Aquin SDK, etc.) writes metrics as JSONL; aquin watch ingests those lines, stores them locally, and mirrors loss/LR/grad charts to the web orchestrator panel. Aquin does not run optimizer steps or store merged weights here — for post-training SAE diff on checkpoints, see Checkpoint SAE (/docs/checkpoint-sae). Registry lives at ~/.aquin/watch/<run_id>/ (manifest.json + events.jsonl). Simulation runs stay in ~/.aquin/runs/ and appear only under aquin list simulation.
4 commands
aquin watch list
List local watch runs: run id, status, name, base model, event count.
aquin watch init
Register a watch run before ingesting metrics. Default run name is the active session name. Writes manifest.json and a start event to events.jsonl under ~/.aquin/watch/<run_id>/.
| Flag | Description |
|---|---|
| --name | Override display name (default: active session name, or watch-run). |
| --model | Base model slug shown on web chart (e.g. llama-3.2-1b). |
| --quant | Quantization label: fp16, int8, q4, none (default: none). |
| --mode | Run mode label (default: external). |
aquin watch ingest
Parse a metrics JSONL file and append observations to a watch run. Batch mode (default) reads the file once and exits. With --follow, keeps tailing the file as new lines are appended (live training). Each synced ingest pushes training.watch.start and opens a new chart card on the web mirror. Web sync uses the active session from aquin session start.
| Flag | Description |
|---|---|
| --run | Existing watch run id (from init or list). |
| --name | With --file and no --run: override new run name (default: active session name). |
| --file | Path to metrics JSONL. Omit to read stdin. |
| --follow | Tail the file; ingest new lines as they appear. |
| --finish | Mark run stopped when ingest ends. |
| --map src=dst | Rename a metric column (e.g. --map train_loss=loss). |
| --step-field | Step column name (default: step, global_step, …). |
| --offset | Skip first N lines (resume ingest). |
| --auto-step | Assign steps 0,1,2… when rows have no step field. |
Fixture: fixtures/e2e/watch/metrics.jsonl. One JSON object per line. Scalar keys (loss, learning_rate, grad_norm, epoch) become chart channels. Special rows: {"type":"signal",…} and {"status":"stopped"}.
aquin watch <run_id>
Replay or live-tail the local events.jsonl for a run. Default follows new events (Ctrl+C to detach). --no-follow replays once and exits. Syncs to the active session from aquin session start.
| Flag | Description |
|---|---|
| --no-follow | Replay stored events once; do not wait for new lines. |
| --output json | Print raw JSON events. |
Quick start
Watch does not run GPU inspection on your metrics file, but an active session still needs a model at start. Use a small catalog model (e.g. pythia-70m) if you only need the web mirror. Run aquin session start --id <id> --model <model-id> once so ingest mirrors charts to your session tab in the web app.
Live training (--follow)
Point ingest at a metrics file your trainer appends to. Each new JSON line is picked up automatically and synced to the web mirror when a session is active.
Metrics JSONL format
One JSON object per line. Step comes from global_step, step, or --step-field. Numeric scalars become chart channels. Use --map train_loss=loss to rename trainer column names.
Watch vs simulate
| Training watch | aquin simulate | |
|---|---|---|
| Purpose | Observe real external training metrics | Forecast training without weight updates |
| Storage | ~/.aquin/watch/<run_id>/ | ~/.aquin/runs/<run_id>/ |
| List command | aquin watch list | aquin list simulation |
| GPU | Not required | Required (model load) |
| Web mirror | Loss/LR/grad chart cards (training.watch.*) | Simulation result cards (tool.result) |
For SAE diff / temp train / align on real checkpoints after training, see Checkpoint SAE. Watch does not store merged weights or run GPU SAE tools.
Session sync uses the active session from aquin session start --id <id> --model <model-id>. Resume later with aquin session switch <session>. Delete with aquin session delete <session>.
