Checkpoint Analysis Plan

Purpose

This document defines what to analyze at every saved checkpoint in the reference formation run.

The goal is not to track every changing weight. That is not tractable and it is not the right object. The goal is to track the emergence of the effective mechanism:

Main Principle

Use a hierarchical analysis stack:

  1. behavior
  2. residual-stream state
  3. heads and MLP blocks as writers/readers
  4. residual features or subspaces
  5. individual neurons only inside already-localized components

The unit of analysis is not “all weights”. The unit of analysis is the reduced effective mechanism inside a dense model.

Current Reference Regime

Current provisional reference regime:

Current formation run:

Fixed Analysis Inputs

Every checkpoint analysis should use the same fixed inputs:

1. Fixed Probe Set

Create a small, immutable probe set for repeated checkpoint analysis.

It should include:

The probe set should be small enough for every-checkpoint analysis and stable enough that metric changes are comparable across checkpoints.

2. Full Evaluation Splits

Keep full-split evaluation separate from probe-set analysis.

Use full splits for:

Do not depend on full-split evaluation for all mechanistic measurements at dense checkpoint cadence.

What To Analyze At Every Checkpoint

These are the required analyses for every saved checkpoint.

A. Behavioral Metrics

These answer: “what behavior is present now?”

Required outputs:

B. Residual-Stream Probes

These answer: “what information is linearly available in the residual stream now?”

Measure at the key positions needed for the task:

Measure at every layer boundary:

Probe targets:

Required outputs:

Interpretation:

C. Head-Level Metrics

These answer: “which attention heads are starting to do useful work?”

For every head:

Required outputs:

D. MLP-Block Metrics

These answer: “which MLP blocks start to matter, even if individual neurons are too unstable to track directly?”

For every MLP block:

Required outputs:

E. Dynamics Metrics

These answer: “is the mechanism stabilizing or still reorganizing?”

Track:

Required outputs:

What Not To Run At Every Checkpoint

The following are too expensive or too detailed to run on every saved checkpoint:

These must run only on selected birth windows.

Birth-Window Escalation

After the all-checkpoint sweep, identify narrow windows where something important changes.

Birth-window triggers:

For those windows only, run:

Artifact Layout

Each formation run should produce:

Minimal Schema For analysis/checkpoint_metrics.jsonl

Each row should contain:

Already Implemented

Currently implemented:

Next Analysis Work To Build

Still missing and should be built next:

  1. fixed probe-set generation and storage
  2. residual-stream capture at task-relevant positions
  3. residual linear probes by layer
  4. MLP-block ablation metrics
  5. checkpoint-metric sweep command
  6. birth-window detector
  7. targeted intervention runner

Practical Research Workflow

Phase A: Single-Seed Formation Trace

Phase B: Seed Replication

Phase C: Factor Screens

Vary one factor at a time:

For each factor, compare:

Decision Rule For Starting Deep Circuit Work

Start expensive circuit-analysis work only when:

Do not start with neuron-level inspection across all checkpoints. Start with the all-checkpoint coarse sweep, then escalate only on selected windows.