Analysis CLI Guide
Analysis CLI Guide
This file is a practical operator guide for the analysis tools used in the symbolic KV circuit work and for extending the toolkit carefully.
It is not a paper and it is not a replacement for src/circuit/cli.py. The goal is simpler:
- tell you which command to run for which research question
- give one working command shape per tool
- tell you which outputs matter
- document the failure modes that actually happened in this repo
The tools are intentionally strict. They do not hide mismatches. If inputs disagree, they should fail.
The current command examples target the repo’s symbolic KV model. To use the toolkit on other open-weight models, add explicit adapters for model loading, tokenizer/task construction, module names, activation hook points, checkpoint format, and optimizer-state traces. QK/OV analysis and residual tracing can generalize through those adapters; optimizer-update attribution requires actual optimizer states or a replayable training trace.
Base Paths
Most commands in this guide use the reference run:
export CIRCUIT_PYTHON="${CIRCUIT_PYTHON:-python}"
export CIRCUIT_DEVICE="${CIRCUIT_DEVICE:-cpu}"
export CIRCUIT="PYTHONPATH=src $CIRCUIT_PYTHON -m circuit.cli"
RUN=artifacts/runs/symbolic_kv_reference_formation
CONFIG=$RUN/run_config.json
PROBE=$RUN/analysis/probe_set.jsonl
TRAIN_PROBE=$RUN/analysis/probe_set_train.jsonl
CKPT_DIR=$RUN/checkpoints
ANALYSIS=$RUN/analysis
For cross-seed work:
CROSS_ROOT=artifacts/runs/symbolic_kv_cross_seed_adam
Research Workflow
Use the tools in this order.
| Question | Main tool | What it answers |
|---|---|---|
| Which trained mechanism matters? | attention-geometry-trace, path-logit-decomposition, route-competition-report |
Finds the candidate route and downstream path |
| Does a QK route form in weight space? | weight-svd-trace, weight-svd-patterns |
Shows singular growth, effective-rank drop, vector stabilization |
| Does the route become semantic? | contextual-key-separability, contextual-svd-alignment, bilinear-qk-match-separation |
Tests whether the route aligns with contextual residual structure and support-vs-distractor separation |
| Did checkpoint updates grow the route? | checkpoint-update-attribution, bilinear-qk-rank-update-attribution, attention-retrieval-separation-update-attribution |
First-order route-growth attribution across checkpoints |
| Did the actual training batch grow the route? | optimizer-update-trace, bilinear-qk-rank-actual-batch-attribution, actual-batch-route-attribution |
Uses exact traced batches and parameter updates |
| Why did the QK route grow? | bilinear-qk-rank-adam-state-attribution |
Decomposes actual update into raw SGD-equivalent, clipped SGD-equivalent, Adam current, momentum, weight decay for a rank-limited QK matcher |
| Why did the OV/write scalar grow? | attention-downstream-adam-state-attribution |
Decomposes actual update into AdamW pieces and splits pressure over the traced head’s W_Q, W_K, W_V, and W_O slices |
| What value code does the readout use? | value-code-subspace-report, geometry-subspace-intervention |
Tracks prediction-position value identity and tests whether removing/keeping that subspace changes behavior |
| Does support value-code predict prediction value-code? | value-code-transfer-map-report |
Fits and controls a support-to-prediction value-code transfer map |
| Can that transfer causally replace the prediction value-code? | value-code-transfer-rescue |
Removes the target value-code component and patches back the fitted transfer or controls |
| Does the same role repeat across seeds? | scripts/cross_seed_adam_pipeline.py |
Winner / runner-up / bottom-control comparison across seeds |
Minimal Decision Tree
If you only remember one section, use this one.
I want to know where the route forms
Run:
weight-svd-trace- then
weight-svd-patterns - then
bilinear-qk-match-separation
I want to know whether the route became task-meaningful
Run:
contextual-key-separabilitycontextual-svd-alignmentbilinear-qk-match-separation
I want to know whether optimizer updates selected that geometry
Run:
optimizer-update-tracebilinear-qk-rank-actual-batch-attributionbilinear-qk-rank-adam-state-attribution
I want to know whether optimizer updates built the write side
Run:
ov-write-progress-reportoptimizer-update-traceattention-downstream-update-attributionattention-downstream-adam-state-attribution
I want to know what the write side writes into readout
Run:
value-code-subspace-reportvalue-code-transfer-map-reportvalue-code-transfer-rescuegeometry-subspace-interventionwith--subspace embedding_value_identity- a rank-matched
embedding_key_identitycontrol
I want to know whether this is seed-specific
Run:
scripts/cross_seed_adam_pipeline.py
Command Reference
The examples below are the canonical shapes used in this repo. Replace only the parts that actually need changing.
1. Trained-model geometry
attention-geometry-trace
Use this first when you want a checkpoint timeline for attention/readout geometry.
$CIRCUIT attention-geometry-trace \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_005000.pt \
--checkpoint $CKPT_DIR/step_005250.pt \
--checkpoint $CKPT_DIR/step_007500.pt \
--checkpoint $CKPT_DIR/step_007750.pt \
--checkpoint $CKPT_DIR/step_008000.pt \
--checkpoint $CKPT_DIR/step_008250.pt \
--output-dir $ANALYSIS/attention_geometry/l2h1_value_write_timeline \
--device "$CIRCUIT_DEVICE" \
--top-k-tokens 8 \
--top-k-plot-heads 12
Important outputs:
reportmarkdownrowsplots.checkpoint_summaryplots.role_attention
Use it for:
- answer margin trajectory
- answer accuracy trajectory
- role-level attention and value-alignment summaries
path-logit-decomposition
Use this when you want direct-logit attribution and ablation-vs-DLA comparisons.
$CIRCUIT path-logit-decomposition \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_005000.pt \
--checkpoint $CKPT_DIR/step_005250.pt \
--checkpoint $CKPT_DIR/step_007500.pt \
--checkpoint $CKPT_DIR/step_007750.pt \
--checkpoint $CKPT_DIR/step_008000.pt \
--checkpoint $CKPT_DIR/step_008250.pt \
--output-dir $ANALYSIS/path_logit_decomposition/l2h1_value_write_timeline \
--device "$CIRCUIT_DEVICE" \
--ablation-top-k 8 \
--ablation-step 5250 \
--ablation-step 8000 \
--top-k-plot-components 16
Important outputs:
reportmarkdownplots.component_trajectoryplots.stage_readoutplots.ablation_vs_dla
2. Route competition
route-competition-report
Use this to compare candidate routes in a common evaluation frame.
$CIRCUIT route-competition-report \
--config $CONFIG \
--probe-set $PROBE \
--train-probe-set $TRAIN_PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_005000.pt \
--checkpoint $CKPT_DIR/step_005250.pt \
--output-dir $ANALYSIS/route_competition/query_key_routes_5000_5250 \
--device "$CIRCUIT_DEVICE" \
--route 'label=L2H1_qk_query,stage=layer_1_post_mlp,subspace=head_qk_query,rank=4,head_layer=2,head=1,position_role=query_key' \
--route 'label=L1H2_qk_query,stage=layer_0_post_mlp,subspace=head_qk_query,rank=4,head_layer=1,head=2,position_role=query_key' \
--route 'label=L0H0_qk_query,stage=embedding,subspace=head_qk_query,rank=4,head_layer=0,head=0,position_role=query_key' \
--route-pair-type query_key \
--pair-type query_key \
--pair-type distractor \
--train-pair-type query_key \
--data-group-field pair_type \
--eval-split validation_iid \
--train-split train \
--eval-loss-side both \
--train-loss-side clean \
--max-pairs-per-type 64 \
--min-pairs-per-type 16
Important outputs:
route_rowsdata_rowspair_rowsplots.train_supportplots.eval_actual_delta
Use it for:
- candidate ranking
- winner / runner-up / bottom controls
- cross-seed candidate selection
3. Weight-space formation
weight-svd-trace
Use this when you want raw SVD trajectories for W_Q, W_K, W_V, W_O, W_QK, W_OV, W_in, W_out.
$CIRCUIT weight-svd-trace \
--config $CONFIG \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_000250.pt \
--checkpoint $CKPT_DIR/step_000500.pt \
--checkpoint $CKPT_DIR/step_000750.pt \
--checkpoint $CKPT_DIR/step_001000.pt \
--checkpoint $CKPT_DIR/step_001500.pt \
--checkpoint $CKPT_DIR/step_002000.pt \
--checkpoint $CKPT_DIR/step_002500.pt \
--checkpoint $CKPT_DIR/step_003000.pt \
--checkpoint $CKPT_DIR/step_003500.pt \
--checkpoint $CKPT_DIR/step_004000.pt \
--checkpoint $CKPT_DIR/step_004500.pt \
--checkpoint $CKPT_DIR/step_005000.pt \
--checkpoint $CKPT_DIR/step_005250.pt \
--checkpoint $CKPT_DIR/step_005500.pt \
--output-dir $ANALYSIS/weight_svd_trace/phase1_000250_5500_top16 \
--device cpu \
--max-singular-values 16 \
--top-vector-ranks 16 \
--overwrite
Important outputs:
singular_values_jsonlsingular_values_csvtop_singular_vectors_jsonl
What to inspect:
- top singular value growth
- effective rank
- spectral mass concentration
- singular-vector rotation/stabilization
weight-svd-patterns
Use this after weight-svd-trace to summarize births, stabilization windows, and coordination windows.
$CIRCUIT weight-svd-patterns \
--singular-values $ANALYSIS/weight_svd_trace/phase1_000250_5500_top16/weight_svd_singular_values.jsonl \
--top-singular-vectors $ANALYSIS/weight_svd_trace/phase1_000250_5500_top16/weight_svd_top_singular_vectors.jsonl \
--output-dir $ANALYSIS/weight_svd_patterns/phase1_000250_5500_top16 \
--max-vector-rank 16 \
--markdown-top-k 24 \
--overwrite
Important outputs:
matrix_summary_rowsvector_alignment_rowsinterval_event_rowscoordination_window_rows
4. Contextual semanticity
contextual-key-separability
Use this to test whether contextual residual states are separating the relevant key groups.
$CIRCUIT contextual-key-separability \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_000250.pt \
--checkpoint $CKPT_DIR/step_000500.pt \
--checkpoint $CKPT_DIR/step_000750.pt \
--checkpoint $CKPT_DIR/step_001000.pt \
--checkpoint $CKPT_DIR/step_001500.pt \
--checkpoint $CKPT_DIR/step_002000.pt \
--checkpoint $CKPT_DIR/step_002500.pt \
--checkpoint $CKPT_DIR/step_003000.pt \
--checkpoint $CKPT_DIR/step_003500.pt \
--checkpoint $CKPT_DIR/step_004000.pt \
--checkpoint $CKPT_DIR/step_004500.pt \
--checkpoint $CKPT_DIR/step_005000.pt \
--checkpoint $CKPT_DIR/step_005500.pt \
--output-dir $ANALYSIS/contextual_key_separability/l2h1_prediction_query_key_stage_sweep_000250_005500 \
--device "$CIRCUIT_DEVICE" \
--head-layer 2 \
--head 1 \
--context-stage layer_1_post_mlp \
--context-role prediction \
--group-by query_key \
--projection-rank 8 \
--split validation_iid \
--window-start 750 \
--window-end 3500 \
--include-full-residual \
--overwrite
Important outputs:
metric_rowsmetric_csvgroup_rowsplots.trajectory
Use it for:
- pairwise key separability
- stage comparison
- windowed semanticity checks
contextual-svd-alignment
Use this when you want to compare singular directions against contextual residual subspaces.
$CIRCUIT contextual-svd-alignment \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_000250.pt \
--checkpoint $CKPT_DIR/step_000500.pt \
--checkpoint $CKPT_DIR/step_000750.pt \
--checkpoint $CKPT_DIR/step_001000.pt \
--checkpoint $CKPT_DIR/step_001500.pt \
--checkpoint $CKPT_DIR/step_002000.pt \
--checkpoint $CKPT_DIR/step_002500.pt \
--checkpoint $CKPT_DIR/step_003000.pt \
--checkpoint $CKPT_DIR/step_003500.pt \
--checkpoint $CKPT_DIR/step_004000.pt \
--checkpoint $CKPT_DIR/step_004500.pt \
--checkpoint $CKPT_DIR/step_005000.pt \
--checkpoint $CKPT_DIR/step_005500.pt \
--output-dir $ANALYSIS/contextual_svd_alignment/l2h1_prediction_grouped_by_query_key_layer1_post_mlp_000250_005500 \
--device "$CIRCUIT_DEVICE" \
--head-layer 2 \
--head 1 \
--context-stage layer_1_post_mlp \
--role prediction \
--role-spec support_value:position_role=support_value \
--plot-left-role prediction \
--plot-right-role support_value \
--top-ranks 4 \
--pca-rank 4 \
--batch-size 16 \
--split validation_iid \
--overwrite
Important outputs:
alignment_rowsrank_aggregate_rowssubspace_rowsrole_vector_rows
5. QK route measurements
bilinear-qk-match-separation
Use this to define and track support-vs-distractor QK route quality directly.
$CIRCUIT bilinear-qk-match-separation \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_000250.pt \
--checkpoint $CKPT_DIR/step_000500.pt \
--checkpoint $CKPT_DIR/step_000750.pt \
--checkpoint $CKPT_DIR/step_001000.pt \
--checkpoint $CKPT_DIR/step_001500.pt \
--checkpoint $CKPT_DIR/step_002000.pt \
--checkpoint $CKPT_DIR/step_002500.pt \
--checkpoint $CKPT_DIR/step_003000.pt \
--checkpoint $CKPT_DIR/step_003500.pt \
--checkpoint $CKPT_DIR/step_004000.pt \
--checkpoint $CKPT_DIR/step_004500.pt \
--checkpoint $CKPT_DIR/step_005000.pt \
--checkpoint $CKPT_DIR/step_005500.pt \
--output-dir $ANALYSIS/bilinear_qk_match_separation/l2h1_support_value_vs_distractors_000250_005500_stage_sweep \
--device "$CIRCUIT_DEVICE" \
--head-layer 2 \
--head 1 \
--context-stage layer_1_post_mlp \
--score-query-role prediction \
--support-role support_value \
--distractor-role value_distractors \
--layernorm-mode head_ln1 \
--rank 8 \
--group-by query_key \
--split validation_iid \
--window-start 750 \
--window-end 3500 \
--overwrite
Important outputs:
metric_rowsmetric_csvevent_rowsgroup_rows
Key fields:
qk_match_separation_meansupport_beats_all_rateanswer_margin_meanqk_singular_value_top
6. Checkpoint-to-checkpoint first-order attribution
checkpoint-update-attribution
Use this for generic route/subspace update attribution between checkpoints.
$CIRCUIT checkpoint-update-attribution \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_005000.pt \
--checkpoint $CKPT_DIR/step_005250.pt \
--output-dir $ANALYSIS/checkpoint_update_attribution/l2h1_qk_query_rank4_5000_5250_top40 \
--device "$CIRCUIT_DEVICE" \
--stage layer_1_post_mlp \
--subspace head_qk_query \
--rank 4 \
--head-layer 2 \
--head 1 \
--position-role query_key \
--pair-type query_key \
--pair-type distractor \
--max-pairs-per-type 64 \
--min-pairs-per-type 16 \
--decompose module_blocks \
--decompose attention_heads \
--decompose attention_projections \
--decompose mlp_neurons \
--top-k-groups 40
Important outputs:
metric_rowsdecomposition_rowsgroup_rowspair_rows
bilinear-qk-rank-update-attribution
Use this when the object of interest is a bilinear QK rank, not a generic residual subspace.
$CIRCUIT bilinear-qk-rank-update-attribution \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_000750.pt \
--checkpoint $CKPT_DIR/step_001000.pt \
--checkpoint $CKPT_DIR/step_001250.pt \
--checkpoint $CKPT_DIR/step_001500.pt \
--checkpoint $CKPT_DIR/step_002000.pt \
--checkpoint $CKPT_DIR/step_002500.pt \
--checkpoint $CKPT_DIR/step_003000.pt \
--checkpoint $CKPT_DIR/step_003500.pt \
--output-dir $ANALYSIS/bilinear_qk_rank_update_attribution/l2h1_rank4_rank8_support_value_minus_distractors_000750_003500_formation \
--device "$CIRCUIT_DEVICE" \
--head-layer 2 \
--head 1 \
--rank 4 \
--rank 8 \
--context-stage layer_1_post_mlp \
--layernorm-mode head_ln1 \
--score-query-role prediction \
--support-key-role support_value \
--distractor-key-role value_distractors \
--record-side clean \
--pair-type support_value \
--pair-type distractor \
--split validation_iid \
--decompose module_blocks \
--decompose attention_heads \
--decompose attention_projections \
--decompose mlp_neurons \
--top-k-groups 40
Important outputs:
metric_rowsdecomposition_rowsgroup_rowsscore_rowspair_rows
7. Stepwise route behavior
attention-retrieval-separation-update-attribution
Use this for stepwise support-vs-distractor attention separation.
$CIRCUIT attention-retrieval-separation-update-attribution \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_000750.pt \
--checkpoint $CKPT_DIR/step_001000.pt \
--checkpoint $CKPT_DIR/step_001250.pt \
--checkpoint $CKPT_DIR/step_001500.pt \
--checkpoint $CKPT_DIR/step_002000.pt \
--checkpoint $CKPT_DIR/step_002500.pt \
--checkpoint $CKPT_DIR/step_003000.pt \
--checkpoint $CKPT_DIR/step_003500.pt \
--output-dir $ANALYSIS/attention_retrieval_separation_update_attribution/l2h1_support_value_minus_distractors_000750_003500_formation \
--device "$CIRCUIT_DEVICE" \
--head-layer 2 \
--head 1 \
--score-query-role prediction \
--support-key-role support_value \
--distractor-key-role value_distractors \
--record-side clean \
--score-component score \
--pair-type support_value \
--pair-type distractor \
--split validation_iid \
--decompose module_blocks \
--decompose attention_heads \
--decompose attention_projections \
--decompose mlp_neurons \
--top-k-groups 40
Important outputs:
metric_rowsdecomposition_rowsgroup_rowsscore_rowspair_rows
attention-retrieval-chain-report
Use this to get the checkpoint-level chain summary for one head.
$CIRCUIT attention-retrieval-chain-report \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_005500.pt \
--checkpoint $CKPT_DIR/step_005750.pt \
--checkpoint $CKPT_DIR/step_006000.pt \
--checkpoint $CKPT_DIR/step_006250.pt \
--checkpoint $CKPT_DIR/step_006500.pt \
--checkpoint $CKPT_DIR/step_006750.pt \
--checkpoint $CKPT_DIR/step_007000.pt \
--checkpoint $CKPT_DIR/step_007250.pt \
--checkpoint $CKPT_DIR/step_007500.pt \
--output-dir $ANALYSIS/attention_retrieval_chain/l2h1_support_value_minus_distractors_5500_7500_neighbor_intervals \
--device "$CIRCUIT_DEVICE" \
--head-layer 2 \
--head 1 \
--score-query-role prediction \
--support-key-role support_value \
--distractor-key-role value_distractors \
--record-side clean \
--pair-type support_value \
--pair-type distractor
Important outputs:
checkpoint_rowsdelta_rowspair_metric_rowsplots.trajectory
8. Exact traced training updates
optimizer-update-trace
Use this before any actual-batch or Adam-state attribution. This tool is the source of truth for traced batches, checkpoints, and parameter updates.
From initialization:
$CIRCUIT optimizer-update-trace \
--config $CONFIG \
--from-initialization \
--output-dir $ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise \
--device "$CIRCUIT_DEVICE" \
--end-step 6000 \
--train-split train \
--checkpoint-every 1 \
--checkpoint-start-step 0 \
--progress-every 100 \
--top-k-parameters 40 \
--overwrite
Resume from a checkpoint:
$CIRCUIT optimizer-update-trace \
--config $CONFIG \
--resume-checkpoint $CKPT_DIR/step_005500.pt \
--output-dir $ANALYSIS/optimizer_update_trace/l2h1_qk_rank_0550_0750_stepwise \
--device "$CIRCUIT_DEVICE" \
--num-steps 2000 \
--train-split train \
--checkpoint-every 1 \
--checkpoint-start-step 5500 \
--progress-every 100 \
--top-k-parameters 40 \
--overwrite
Important outputs:
step_rowsbatch_rowsparameter_update_rowscheckpoints/
Do not treat this as optional if you need exact update attribution.
bilinear-qk-rank-actual-batch-attribution
Use this to project actual traced batch updates onto a QK-rank route.
$CIRCUIT bilinear-qk-rank-actual-batch-attribution \
--config $CONFIG \
--probe-set $PROBE \
--optimizer-trace-dir $ANALYSIS/optimizer_update_trace/l2h1_qk_rank_0750_1000_stepwise \
--output-dir $ANALYSIS/bilinear_qk_rank_actual_batch_attribution/l2h1_rank8_support_value_0750_1000_stepwise \
--device "$CIRCUIT_DEVICE" \
--head-layer 2 \
--head 1 \
--rank 8 \
--context-stage layer_1_post_mlp \
--layernorm-mode head_ln1 \
--score-query-role prediction \
--support-key-role support_value \
--distractor-key-role value_distractors \
--record-side clean \
--route-pair-type support_value \
--route-pair-source-type support_value \
--max-route-pairs-per-type 64 \
--min-route-pairs-per-type 16 \
--loss-scope full_lm \
--overwrite
Important outputs:
route_rowsactual_batch_rowsroute_pair_rows
bilinear-qk-rank-adam-state-attribution
Use this for the optimizer-level “why” question.
$CIRCUIT bilinear-qk-rank-adam-state-attribution \
--config $CONFIG \
--probe-set $PROBE \
--optimizer-trace-dir $ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise \
--output-dir $ANALYSIS/bilinear_qk_rank_adam_state_attribution/from_init_l2h1_rank8_support_value_0000_6000_stepwise \
--device "$CIRCUIT_DEVICE" \
--head-layer 2 \
--head 1 \
--rank 8 \
--context-stage layer_1_post_mlp \
--layernorm-mode head_ln1 \
--score-query-role prediction \
--support-key-role support_value \
--distractor-key-role value_distractors \
--record-side clean \
--route-pair-type support_value \
--route-pair-source-type support_value \
--max-route-pairs-per-type 64 \
--min-route-pairs-per-type 16 \
--loss-scope full_lm \
--overwrite
Important outputs:
metric_rowscomponent_rowsroute_pair_rows
This is the command that answers:
How much came from the raw SGD-equivalent update?
How much came from Adam current gradient?
How much came from momentum?
How much came from weight decay?
attention-downstream-adam-state-attribution
Use this for the OV/write-side optimizer question. It uses the same exact optimizer trace as the QK Adam tool, but the target scalar is a downstream write quantity such as qk_ov_product, support_mass_ov_value_margin, attended_support_ov_value_margin, head_value_margin_dla, or head_margin_dla_fixed_readout.
The output has two levels:
- global AdamW decomposition for the scalar
- parameter-group decomposition over the traced head’s
q_proj,k_proj,v_proj,out_proj, and combinedqkvoslices - optional extra groups such as
module:L0.mlp,module:L1.attention, or another head’s projection via repeated--parameter-group
$CIRCUIT attention-downstream-adam-state-attribution \
--config $CONFIG \
--probe-set $PROBE \
--optimizer-trace-dir $ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise \
--output-dir $ANALYSIS/attention_downstream_adam_state_attribution/l1h2_support_value_write_5500_5501_smoke \
--device "$CIRCUIT_DEVICE" \
--checkpoint $ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise/checkpoints/step_005500.pt \
--checkpoint $ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise/checkpoints/step_005501.pt \
--head-layer 1 \
--head 2 \
--score-query-role prediction \
--support-key-role support_value \
--distractor-key-role value_distractors \
--record-side clean \
--scalar qk_ov_product \
--scalar support_mass_ov_value_margin \
--scalar attended_support_ov_value_margin \
--scalar head_value_margin_dla \
--scalar head_margin_dla_fixed_readout \
--objective-pair-type support_value \
--route-pair-source-type support_value \
--max-route-pairs-per-type 64 \
--min-route-pairs-per-type 16 \
--loss-scope full_lm \
--overwrite
Important outputs:
metric_rowscomponent_rowsgroup_rowsroute_pair_rows
This command answers:
Did the actual AdamW update increase the write scalar?
Was the raw SGD-equivalent update tiny or large for that write scalar?
Did Adam current gradient or historical momentum carry the update?
Did the useful pressure land in W_V, W_O, QK slices, or outside the traced head?
The two audit-selected scalar forms are:
support_mass_ov_value_margin = total support attention mass * OV value margin
qk_ov_product = QK support-minus-distractor score separation * OV value margin
ov-write-progress-report
Use this before OV/write optimizer attribution. It audits candidate heads and write scalars across checkpoints, with readout-level attention-frozen and shuffled-value controls.
This is the scalar-selection step. Do not skip it and jump straight to AdamW decomposition unless you already know which write scalar is meaningful.
$CIRCUIT ov-write-progress-report \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $CKPT_DIR \
--checkpoint $CKPT_DIR/step_000750.pt \
--checkpoint $CKPT_DIR/step_001000.pt \
--checkpoint $CKPT_DIR/step_001250.pt \
--checkpoint $CKPT_DIR/step_001500.pt \
--checkpoint $CKPT_DIR/step_001750.pt \
--checkpoint $CKPT_DIR/step_002000.pt \
--checkpoint $CKPT_DIR/step_002250.pt \
--checkpoint $CKPT_DIR/step_002500.pt \
--checkpoint $CKPT_DIR/step_002750.pt \
--checkpoint $CKPT_DIR/step_003000.pt \
--checkpoint $CKPT_DIR/step_003250.pt \
--checkpoint $CKPT_DIR/step_003500.pt \
--output-dir $ANALYSIS/ov_write_progress/l0_l1_l2_attention_0750_3500_formation \
--device "$CIRCUIT_DEVICE" \
--head L0H0 \
--head L0H1 \
--head L0H2 \
--head L0H3 \
--head L1H2 \
--head L2H1 \
--score-query-role prediction \
--support-key-role support_value \
--distractor-key-role value_distractors \
--record-side clean \
--pair-type support_value \
--max-pairs-per-type 64 \
--min-pairs-per-type 16 \
--top-k-correlations 32 \
--overwrite
Important outputs:
checkpoint_rowsdelta_rowscorrelation_rowspair_rows
The four conditions are:
real_attention_real_values
correct_support_attention_real_values
real_attention_shuffled_values
correct_support_attention_shuffled_values
This command answers:
Which OV/write scalar has a clean birth curve?
Does forcing correct-support attention make the write useful?
Does shuffling the support value destroy the write signal?
Which write scalar best tracks fixed-competitor margin, correct-value logit, or negative loss?
9. Value-code readout
value-code-subspace-report
Use this after the write-side audit when the question is no longer “does the residual change matter?” but “what code does the mature readout use?”
This command tracks whether prediction-position residual states become separable by answer value, support value, or both.
TRACE_CKPTS=$ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise/checkpoints
$CIRCUIT value-code-subspace-report \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $TRACE_CKPTS \
--checkpoint $TRACE_CKPTS/step_001500.pt \
--checkpoint $TRACE_CKPTS/step_001750.pt \
--checkpoint $TRACE_CKPTS/step_002000.pt \
--checkpoint $TRACE_CKPTS/step_002500.pt \
--checkpoint $TRACE_CKPTS/step_003000.pt \
--checkpoint $TRACE_CKPTS/step_003500.pt \
--output-dir $ANALYSIS/value_code_subspace/prediction_answer_value_1500_3500_cli \
--device "$CIRCUIT_DEVICE" \
--stage layer_0_post_mlp \
--stage layer_1_post_mlp \
--stage layer_2_post_mlp \
--stage final_norm \
--position-role prediction \
--position-role support_value \
--group-by answer_value \
--group-by support_value \
--split validation_iid \
--max-records 256 \
--pca-rank 4 \
--overwrite
Important outputs:
value_code_rowssummary_rowssubspace_rows
This command answers:
When does the prediction residual start reading out the answer value?
Is the readable code grouped by answer value or just by support position?
Is the value-code object low-rank or broad?
value-code-transfer-map-report
Use this when the remaining question is the support-to-prediction bridge:
support-value residual state -> prediction-position value-code state
The tool builds value-identity bases on a deterministic fit split, fits a ridge-stabilized affine map from source coordinates to target coordinates, and evaluates the map on heldout probe rows. Controls use the same heldout rows.
TRACE_CKPTS=$ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise/checkpoints
$CIRCUIT value-code-transfer-map-report \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $TRACE_CKPTS \
--checkpoint $TRACE_CKPTS/step_001500.pt \
--checkpoint $TRACE_CKPTS/step_001750.pt \
--checkpoint $TRACE_CKPTS/step_002000.pt \
--checkpoint $TRACE_CKPTS/step_002500.pt \
--checkpoint $TRACE_CKPTS/step_003000.pt \
--checkpoint $TRACE_CKPTS/step_003500.pt \
--output-dir $ANALYSIS/value_code_transfer_map/support_to_prediction_1500_3500_cli \
--device "$CIRCUIT_DEVICE" \
--source-stage layer_1_post_mlp \
--target-stage layer_2_post_mlp \
--source-position-role support_value \
--target-position-role prediction \
--group-by answer_value \
--split validation_iid \
--max-records 256 \
--basis-rank 4 \
--basis-rank 8 \
--basis-rank 16 \
--control shuffled_answer_value \
--control wrong_support_value \
--control random_subspace \
--fit-fraction 0.75 \
--overwrite
Important outputs:
transfer_rowssummary_rowssubspace_rowspair_rows
This command answers:
Can source value-code coordinates predict prediction value-code coordinates?
Does the true transfer beat shuffled-source, wrong-support, and random-subspace controls?
Does the transferred code itself point toward the correct value under a stage lens?
The optional key_identity control fits a support-key-code map. It is rank-limited by the key-token identity rank, so do not combine it with high value-code ranks unless you expect the command to fail loudly. For the current 8-key task, run it separately with a small rank:
$CIRCUIT value-code-transfer-map-report \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $TRACE_CKPTS \
--checkpoint $TRACE_CKPTS/step_002500.pt \
--output-dir $ANALYSIS/value_code_transfer_map/support_to_prediction_key_control_rank4_cli \
--device "$CIRCUIT_DEVICE" \
--source-stage layer_1_post_mlp \
--target-stage layer_2_post_mlp \
--source-position-role support_value \
--target-position-role prediction \
--group-by answer_value \
--split validation_iid \
--max-records 256 \
--basis-rank 4 \
--control key_identity \
--overwrite
value-code-transfer-rescue
Use this after value-code-transfer-map-report when you need causal sufficiency rather than coordinate prediction.
The tool removes the target value-code projection at the prediction position, then patches back either the actual projected value-code component, the fitted support-to-prediction transfer, or a control transfer.
target_removed = clean_target - project_target_value_code(clean_target)
patched = target_removed + predicted_target_value_code(source)
rescue = scalar(patched) - scalar(target_removed)
The oracle row checks whether the removed target value-code component itself is causal. The true-transfer row checks whether the fitted transfer can replace it.
The output also includes fixed_clean_competitor_margin and fixed_removed_competitor_margin, which hold the wrong-token branch fixed so moving best-wrong switches cannot hide a successful transfer.
The optional context arguments test the next write-side hypothesis: the support value-code alone may not be enough, because the prediction-position residual state can choose how the support code is interpreted. Passing --context-stage, --context-position-role, and --context-rank adds context_only, source_plus_context, and rank-matched contextual control rows. Use this when you need to distinguish a static support-to-prediction transfer from a contextual write operator.
TRACE_CKPTS=$ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise/checkpoints
$CIRCUIT value-code-transfer-rescue \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $TRACE_CKPTS \
--checkpoint $TRACE_CKPTS/step_001750.pt \
--checkpoint $TRACE_CKPTS/step_002000.pt \
--checkpoint $TRACE_CKPTS/step_002500.pt \
--checkpoint $TRACE_CKPTS/step_003000.pt \
--checkpoint $TRACE_CKPTS/step_003500.pt \
--output-dir $ANALYSIS/value_code_transfer_rescue/support_to_prediction_rank16_1750_3500_cli \
--device "$CIRCUIT_DEVICE" \
--source-stage layer_1_post_mlp \
--target-stage layer_2_post_mlp \
--source-position-role support_value \
--target-position-role prediction \
--group-by answer_value \
--split validation_iid \
--max-records 256 \
--basis-rank 16 \
--control shuffled_answer_value \
--control wrong_support_value \
--control random_subspace \
--fit-fraction 0.75 \
--overwrite
Run the rank-limited key control separately:
$CIRCUIT value-code-transfer-rescue \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $TRACE_CKPTS \
--checkpoint $TRACE_CKPTS/step_002500.pt \
--output-dir $ANALYSIS/value_code_transfer_rescue/support_to_prediction_key_control_rank4_cli \
--device "$CIRCUIT_DEVICE" \
--source-stage layer_1_post_mlp \
--target-stage layer_2_post_mlp \
--source-position-role support_value \
--target-position-role prediction \
--group-by answer_value \
--split validation_iid \
--max-records 256 \
--basis-rank 4 \
--control key_identity \
--fit-fraction 0.75 \
--overwrite
Run the contextual transfer version when the source-only transfer rescues answer evidence but not the moving/fixed margin cleanly:
$CIRCUIT value-code-transfer-rescue \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $TRACE_CKPTS \
--checkpoint $TRACE_CKPTS/step_001750.pt \
--checkpoint $TRACE_CKPTS/step_002000.pt \
--checkpoint $TRACE_CKPTS/step_002500.pt \
--checkpoint $TRACE_CKPTS/step_003000.pt \
--checkpoint $TRACE_CKPTS/step_003500.pt \
--output-dir $ANALYSIS/value_code_transfer_rescue/support_to_prediction_context_rank16_1750_3500_cli \
--device "$CIRCUIT_DEVICE" \
--source-stage layer_1_post_mlp \
--target-stage layer_2_post_mlp \
--source-position-role support_value \
--target-position-role prediction \
--context-stage layer_1_post_mlp \
--context-position-role prediction \
--context-rank 16 \
--group-by answer_value \
--split validation_iid \
--max-records 256 \
--basis-rank 16 \
--control shuffled_answer_value \
--control wrong_support_value \
--control random_subspace \
--fit-fraction 0.75 \
--overwrite
geometry-subspace-intervention
Use this to test whether a geometry subspace is causal, not merely readable.
For the value-code claim, remove value identity from layer_2_post_mlp / prediction and compare it against a rank-matched key-identity control.
TRACE_CKPTS=$ANALYSIS/optimizer_update_trace/from_init_seed7_0000_6000_stepwise/checkpoints
$CIRCUIT geometry-subspace-intervention \
--config $CONFIG \
--probe-set $PROBE \
--checkpoint-dir $TRACE_CKPTS \
--checkpoint $TRACE_CKPTS/step_001500.pt \
--checkpoint $TRACE_CKPTS/step_001750.pt \
--checkpoint $TRACE_CKPTS/step_002000.pt \
--checkpoint $TRACE_CKPTS/step_002500.pt \
--checkpoint $TRACE_CKPTS/step_003000.pt \
--checkpoint $TRACE_CKPTS/step_003500.pt \
--output-dir $ANALYSIS/value_code_causal_intervention/embedding_value_identity_prediction_layer2_remove_rank16_1500_3500 \
--device "$CIRCUIT_DEVICE" \
--stage layer_2_post_mlp \
--subspace embedding_value_identity \
--rank 16 \
--operation remove \
--position-role prediction \
--query-mode single_query
Important outputs:
aggregate_rowsquery_rowsplots.margin_dropplots.accuracy_drop
Use it for:
- value-identity removal:
--subspace embedding_value_identity --operation remove - rank-matched key control:
--subspace embedding_key_identity --rank 7 --operation remove - sufficiency checks:
--operation keepat higher value-identity rank
10. Route-family closure
route-family-closure-report
Use this after route-to-margin-closure when the expensive run already measured all route deltas, but you need to compare families such as QK-only, OV-only, and QK+OV on the same observations.
This command does not recompute activations. It refits the existing closure rows with different route subsets.
$CIRCUIT route-family-closure-report \
--route-closure-rows $ANALYSIS/route_to_margin_closure/qk_ov_output_routes_5500_5550_stepwise/route_to_margin_closure_rows.jsonl \
--output-dir $ANALYSIS/route_family_closure/qk_vs_ov_vs_joint_5500_5550_stepwise \
--family label=qk,route=L2H1_qk_query,route=L1H2_qk_query,route=L0H0_qk_query,route=embedding_key_identity,route=full_layer1_query_key,route=full_layer0_query_key \
--family label=ov_input,route=L2H1_ov_input_support_value,route=L1H2_ov_input_support_value,route=L0H0_ov_input_support_value,route=embedding_value_identity,route=full_layer1_support_value,route=full_layer0_support_value \
--family label=ov_output,route=L1H2_ov_output_prediction,route=L2H1_ov_output_prediction,route=full_layer1_post_attn_prediction,route=full_layer2_post_attn_prediction \
--family label=qk_plus_ov,route=L2H1_qk_query,route=L1H2_qk_query,route=L0H0_qk_query,route=embedding_key_identity,route=full_layer1_query_key,route=full_layer0_query_key,route=L2H1_ov_input_support_value,route=L1H2_ov_input_support_value,route=L0H0_ov_input_support_value,route=embedding_value_identity,route=full_layer1_support_value,route=full_layer0_support_value,route=L1H2_ov_output_prediction,route=L2H1_ov_output_prediction,route=full_layer1_post_attn_prediction,route=full_layer2_post_attn_prediction \
--target-scalar answer_margin \
--record-side clean \
--overwrite
Important outputs:
family_summary_rowsinterval_rowscoefficient_rowsplots.r_squaredplots.abs_residual
Use it for:
- whether OV routes add explanatory power beyond QK routes
- whether full QK+OV route-family closure improves answer-margin closure
- deciding whether the OV side is head-local or residual-family-local
11. Output-side validation
Use these after route-level closure, not before.
output-route-closure
Fits output-component routes against scalar rows.
Key outputs:
closure_rowsendpoint_component_rowscoefficient_rows
output-component-causal-validation
Tests whether DLA-like component effects match causal interventions.
Key outputs:
validation_rowssummary_rowsplots.causal_vs_dla
output-mediated-causal-decomposition
Tests whether one component’s effect is mediated through downstream components.
Key outputs:
source_rowsdownstream_rowssource_summary_rowsdownstream_summary_rows
residual-state-rescue
Tests whether patching residual state at later stages rescues a removed source component.
Key outputs:
rescue_rowssummary_rowsplots.rescue_fraction
component-output-rescue
Tests whether patching one clean downstream component write rescues a removed source component.
This is stricter than residual-state-rescue: it does not patch the whole residual stream. For MLPs it replaces the MLP residual write. For attention heads it replaces the single-head residual contribution, computed by subtracting all-heads-off attention output from only-this-head attention output so the shared output bias cancels.
Pass --patch-component L0MLP for single-component rescue. Pass --patch-group L0MLP,L2MLP for ordered multi-component rescue; the tool patches components in model order and recomputes intermediate residual states between patch stages.
Key outputs:
rescue_rowssummary_rowspair_rows
12. Cross-seed pipeline
scripts/cross_seed_adam_pipeline.py
This is the supported driver for:
- preparing seed configs
- traced scan checkpoints
- head scan with
bilinear-qk-match-separation - winner selection
- exact Adam-state attribution for winner / controls
Example:
PYTHONPATH=src $CIRCUIT_PYTHON scripts/cross_seed_adam_pipeline.py \
--base-config $CONFIG \
--probe-set $PROBE \
--run-root $CROSS_ROOT \
--seed 11 \
--seed 13 \
--seed 17 \
--seed 23 \
--seed 29 \
--python "$CIRCUIT_PYTHON" \
--device "$CIRCUIT_DEVICE" \
--end-step 6000 \
--layers 3 \
--heads 4 \
--rank 8 \
--window-start 750 \
--window-end 3500 \
--adam-start-step 750 \
--adam-end-step 2500 \
--split validation_iid \
--stage configs \
--stage trace-scan \
--stage scan \
--stage select \
--stage trace-adam \
--stage adam \
--adam-candidate winner \
--adam-candidate runner_up \
--adam-candidate bottom \
--overwrite
Important outputs:
cross_seed_manifest.jsoncross_seed_winners.jsoncross_seed_winners.csv- per-seed
analysis/cross_seed_head_selection.json - per-seed scan reports
- per-seed Adam-state attribution reports
Use stages separately when disk is tight.
Output Contract
Most analysis commands print a JSON object. Treat that JSON as the contract.
Typical fields:
report: machine-readable JSON summarymarkdown: human-readable report*_rowsor*_csv: row-level artifacts for scriptingplots: SVG figure paths
Do not guess file names when the command already printed them.
Common Failure Modes
These all happened in real use.
Output directory already exists and is non-empty
Cause:
- rerunning into the same directory without
--overwrite
Fix:
- add
--overwrite, or - choose a new output directory
Checkpoint directory not found: .../checkpoints
Cause:
- using an
optimizer-update-tracedirectory that was never produced - running the cross-seed
adamstage without the matchingtrace-adamstage
Fix:
- confirm
<trace_dir>/checkpointsexists - run
optimizer-update-tracefirst
Probe-set file not found
Cause:
- wrong probe-set path
- forgetting that train and validation probe sets are separate files
Fix:
- use
$PROBEfor validation-style analyses - use
$TRAIN_PROBEwhen the command really needs train examples
Failed to construct the requested minimum causal patch pairs
Cause:
- split / pair-type / probe-set combination produced zero valid pairs
Fix:
- check
--pair-type - check
--split - check whether the probe set actually contains those examples
- lower
--min-pairs-per-typeonly if that matches the experiment
Intervention positions must be in the causal prefix
Cause:
- invalid query/key role combination for the chosen attention-score experiment
Fix:
- make sure the key role is available before the query position
- do not use future positions for causal interventions
Scalar recomputation mismatch
Cause:
- scalar rows and output-route closure were built from inconsistent pair sets or tolerances
Fix:
- regenerate the scalar rows and closure from the same pair universe
- keep the same margin-side / pair-type / split filters
Data group values changed across intervals
Cause:
- a grouped attribution summary assumed stable group IDs across intervals, but the actual grouping changed
Fix:
- avoid aggregating that run as one summary
- rerun on a smaller interval window or with a stable grouping field
Optimizer param-group lr mismatch or Recomputed gradient norm mismatch
Cause:
- the optimizer trace and the attribution command do not correspond to the same exact run / trace / replay assumptions
Fix:
- use the exact trace generated for that run
- do not mix traces from different seeds or different replay modes
- check scheduler / LR state consistency
No space left on device
Cause:
- stepwise optimizer traces are large
Fix:
- run fewer stages at once
- use
--scan-checkpoint-every 250for scans - clean old trace directories
- use
--cleanup-adam-tracein the cross-seed driver if you only need the final Adam reports
Recommended Paper Reproduction Sequence
For the current paper result:
attention-geometry-tracepath-logit-decompositionweight-svd-traceweight-svd-patternscontextual-key-separabilitybilinear-qk-match-separationbilinear-qk-rank-update-attributionoptimizer-update-tracebilinear-qk-rank-actual-batch-attributionbilinear-qk-rank-adam-state-attributionscripts/cross_seed_adam_pipeline.py
What This Guide Does Not Cover
This guide does not try to document every exploratory command in src/circuit/cli.py.
It focuses on the formation-analysis stack that was actually used in the current paper:
- trained-model route discovery
- weight-space birth
- contextual semanticity
- exact update attribution
- Adam-state decomposition
- cross-seed role validation
For older feature-family and candidate-registry commands, read: