Kaizen-3C — Architecture-first AI. Planning leads, code follows.

Architecture leads, the model executes

Principles → ADRs → Tasks → Code. The chain of control flows downward from human-authored decisions. The LLM is an executor, never the author.

Principle 5 · ADRs 0055–0057

Principle 5 Architecture is the CEO

Architecture — not the LLM — holds decision authority. The chain of control flows downward: Principles → ADRs → Tasks → Code. The LLM is an executor inside this chain, never its author.

.architecture/principles.md §5

ADR-0055 Task Classification and Routing Accepted

Tasks are classified by content shape before any execution agent runs. Raw, under-decomposed inputs are routed through triage and synthetic-specification generation; well-formed ADR-shaped inputs go directly to execution.

.architecture/decisions/ADR-0055-task-classification-and-routing.md

ADR-0056 ADR Quality Gate Accepted

Specifications are gated for completeness, scope clarity, and acceptance criteria before any execution agent runs. Broken specs fail fast at intake, not mid-execution.

.architecture/decisions/ADR-0056-adr-quality-gate.md

ADR-0057 Synthetic ADR Generation (Triage Agent) Accepted

When inputs arrive without an ADR shape, the Intake and Triage agents produce one before execution begins — no agent is asked to self-triage.

.architecture/decisions/ADR-0057-synthetic-adr-generation-triage-agent.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

Autonomy is latitude, not freedom

Every agent call is scoped to an ADR, observed via telemetry, budgeted by steps and cost. The Write Agent doesn't ask clarifying questions — they were answered upstream.

Principle 14 · ADR-0062

Principle 14 Directed Agency

Every agent call is scoped, observed, budgeted, tool-matched, and goal-serving. Autonomy is a local property inside a directed frame, not freedom from direction. Think Before Coding and Surgical Changes are architectural guarantees, not prompt etiquette.

.architecture/principles.md §14

ADR-0062 The CD-AOR Harness — Definition and Invariants Proposed

The CD-AOR Harness is the composition of nine subsystems that together enclose every LLM and agent call in the system. The Harness is the runtime mechanism through which architectural direction flows: Principles → ADRs → Tasks → Harness → Code.

.architecture/decisions/ADR-0062-cd-aor-harness-definition.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

Confidence is grounded in evidence

Tests are run, not predicted. Static analysis is deterministic. Source-verified outranks LLM-judged. Token probabilities are not a substitute for running the tests.

Principles 1, 7 · ADRs 0008, 0051, 0052

Principle 1 Confidence-Driven Development

Signals are tiered — source-verified (tests, compilers, static analysis) outrank artifact-verified, which outrank LLM-judged. When LLM signals are unavailable, grounded signals are renormalized rather than substituted.

.architecture/principles.md §1

Principle 7 Ground-Truth Over Self-Assessment

Never trust the LLM's self-reported confidence or quality assessment. Ground truth comes from test execution results, tool signals, the Evaluator Agent's independent assessment, and composite confidence score calculation.

.architecture/principles.md §7

ADR-0008 Composite Confidence Scoring Accepted

Replace token-level soft-max confidence with a six-signal weighted composite — test_pass, coverage, adr_compliance, specialist_avg, pragmatic_score, static_analysis — derived from ground-truth tool execution outputs.

.architecture/decisions/ADR-0008-composite-confidence-scoring.md

ADR-0051 Grounded vs Sandbox Signal Collection Accepted

Round 2 introduced grounded signals — running actual tests via subprocess and using ast.parse / py_compile for static analysis. Result: r=0.999 point-biserial correlation between composite confidence and pass/fail.

.architecture/decisions/ADR-0051-grounded-signal-pipeline.md

ADR-0052 Adaptive Convergence (Thompson Sampling) Accepted

Convergence itself is adaptive: Thompson Sampling over historical confidence bins decides when to continue, converge, or early-abort unrecoverable tasks.

.architecture/decisions/ADR-0052-adaptive-convergence-thompson-sampling.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

Iterative denoising, not one-shot

Write → Evaluate → Refine. Convergence is declared only when confidence ≥ threshold and no Red Team findings remain. Not when a sampler decides to stop.

Principle 4 · ADRs 0004, 0054, 0058

Principle 4 Diffusion as First-Class Concept

Treat code generation and refinement as iterative denoising rather than one-shot synthesis. The denoising loop (Write → Evaluate → Refine) is the primary abstraction, not a side effect. Memory tiers (working, episodic, semantic) mirror the phases of the loop.

.architecture/principles.md §4

ADR-0004 TAOR Engine Pluggability Accepted

The TAOR engine is a pluggable interface. Any backend that can participate in the tool-calling conversation loop — cloud API, local model, fine-tuned RL checkpoint — can serve as an engine without modifying orchestrator code.

.architecture/decisions/ADR-0004-taor-engine-pluggability.md

ADR-0054 Three-Tier Memory Hierarchy Accepted

Working, episodic, and semantic memory tiers mirror the phases of the denoising loop. Each commit captures structured metadata so any intermediate state is recoverable.

.architecture/decisions/ADR-0054-three-tier-memory-hierarchy.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

Structural agent independence

The agent that writes code cannot approve it. The Evaluator never sees the Write Agent's chain-of-thought. The Red Team is structurally incentivized to disagree.

Principle 6 · ADR-0009

Principle 6 Structural Agent Independence

Write Agent and Evaluator Agent must be independent implementations with different base models or prompting strategies. Never allow self-vibe-coding where an agent evaluates its own output without critical questioning.

.architecture/principles.md §6

ADR-0009 Anti-Vibe-Coding Guardrails Accepted

The Write Agent and Evaluator Agent are separate model instances with separate system prompts and no shared chain-of-thought. The Evaluator receives only the committed code artifact (via Git) — never the Write Agent's reasoning, intermediate attempts, or self-assessments.

.architecture/decisions/ADR-0009-anti-vibe-coding-guardrails.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

Auditable by default

Every denoising step is a Git commit. Every agent call is instrumented. If an action cannot be observed, it cannot be trusted to have happened.

Principles 10, 11 · ADRs 0006, 0031, 0033

Principle 10 Git as Audit Trail

Every denoising step, decision point, and code generation iteration produces a commit with structured metadata. The git history is the immutable record of how the system arrived at its current state.

.architecture/principles.md §10

Principle 11 Observable Operations

Every agent call, tool invocation, and signal measurement is instrumented. No silent LLM calls, no untelemetered tool use, no out-of-band execution paths. If an action cannot be observed, it cannot be trusted to have happened.

.architecture/principles.md §11

ADR-0006 Git Checkpoint and Audit Trail Accepted

Every denoising step produces a Git commit with structured metadata embedded in the commit message body — formatted as JSON for automated parsing. Together with the audit event log, this forms a two-channel record supporting rollback, forensic analysis, and enterprise compliance.

.architecture/decisions/ADR-0006-git-checkpoint-audit-trail.md

ADR-0031 Observability and Monitoring Accepted

Every component (Rust, Python, C#) emits structured metrics, traces, and logs across the full stack. Cost, latency, provider health, and learning-curve signals are first-class telemetry — the same feedback that drives confidence drives operational decisions.

.architecture/decisions/ADR-0031-observability-and-monitoring.md

ADR-0033 Audit Logging Architecture Accepted

Structured audit events capture user actions, role changes, task submissions, and provider selections in an append-only log. Combined with git history, this is the two-channel record enterprise compliance requires.

.architecture/decisions/ADR-0033-AUDIT-LOGGING-ARCHITECTURE.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

Architecture-first AI.
Planning leads, code follows.

Measured outcomes — not vibes

inih · C → Rust

Nancy · .NET Fx → .NET 8

Public benchmark spend

Architecture leads. Code follows.

Architecture leads, the model executes

Autonomy is latitude, not freedom

Confidence is grounded in evidence

Iterative denoising, not one-shot

Structural agent independence

Auditable by default

What's shipping

kaizen-3c-cli v1.0.0

Public benchmarks

Cross-architecture validation

Methodology paper · NeurIPS 2026 D&B Track

What does the 3C mean?