Kaizen-3C infinity mark Kaizen-3C

architecture-first ai · public beta · v1.0.0

Architecture-first AI.
Planning leads, code follows.

Architecture-driven modernization. Audit trail ships by default. Decompose legacy codebases into editable ADRs, recompose to modern stacks — compliance falls out of doing the architecture right, not bolted on after.

View on GitHub → copied to clipboard

Measured outcomes — not vibes

6 → 0 errors

inih · C → Rust

522 LOC. One-shot LLM: 6 cargo check errors. Plain ADR pipeline: 1 error. ADR + domain schema: 0 errors.

read the case study →
14 → 0 errors

Nancy · .NET Fx → .NET 8

NancyContext.cs, 148 LOC. One-shot: 14 dotnet build errors. Plain ADR: 0 errors.

read the case study →
$182.41 total

Public benchmark spend

8 architectures × 16 libraries × 2 models. Every cell — including the ones our architecture lost.

methodology →

six beliefs · what kaizen does

Architecture leads. Code follows.

01

Architecture leads, the model executes

Principles → ADRs → Tasks → Code. The chain of control flows downward from human-authored decisions. The LLM is an executor, never the author.

Principle 5 · ADRs 0055–0057
Principle 5 Architecture is the CEO
Architecture — not the LLM — holds decision authority. The chain of control flows downward: Principles → ADRs → Tasks → Code. The LLM is an executor inside this chain, never its author.

.architecture/principles.md §5

ADR-0055 Task Classification and Routing Accepted
Tasks are classified by content shape before any execution agent runs. Raw, under-decomposed inputs are routed through triage and synthetic-specification generation; well-formed ADR-shaped inputs go directly to execution.

.architecture/decisions/ADR-0055-task-classification-and-routing.md

ADR-0056 ADR Quality Gate Accepted
Specifications are gated for completeness, scope clarity, and acceptance criteria before any execution agent runs. Broken specs fail fast at intake, not mid-execution.

.architecture/decisions/ADR-0056-adr-quality-gate.md

ADR-0057 Synthetic ADR Generation (Triage Agent) Accepted
When inputs arrive without an ADR shape, the Intake and Triage agents produce one before execution begins — no agent is asked to self-triage.

.architecture/decisions/ADR-0057-synthetic-adr-generation-triage-agent.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

02

Autonomy is latitude, not freedom

Every agent call is scoped to an ADR, observed via telemetry, budgeted by steps and cost. The Write Agent doesn't ask clarifying questions — they were answered upstream.

Principle 14 · ADR-0062
Principle 14 Directed Agency
Every agent call is scoped, observed, budgeted, tool-matched, and goal-serving. Autonomy is a local property inside a directed frame, not freedom from direction. Think Before Coding and Surgical Changes are architectural guarantees, not prompt etiquette.

.architecture/principles.md §14

ADR-0062 The CD-AOR Harness — Definition and Invariants Proposed
The CD-AOR Harness is the composition of nine subsystems that together enclose every LLM and agent call in the system. The Harness is the runtime mechanism through which architectural direction flows: Principles → ADRs → Tasks → Harness → Code.

.architecture/decisions/ADR-0062-cd-aor-harness-definition.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

03

Confidence is grounded in evidence

Tests are run, not predicted. Static analysis is deterministic. Source-verified outranks LLM-judged. Token probabilities are not a substitute for running the tests.

Principles 1, 7 · ADRs 0008, 0051, 0052
Principle 1 Confidence-Driven Development
Signals are tiered — source-verified (tests, compilers, static analysis) outrank artifact-verified, which outrank LLM-judged. When LLM signals are unavailable, grounded signals are renormalized rather than substituted.

.architecture/principles.md §1

Principle 7 Ground-Truth Over Self-Assessment
Never trust the LLM's self-reported confidence or quality assessment. Ground truth comes from test execution results, tool signals, the Evaluator Agent's independent assessment, and composite confidence score calculation.

.architecture/principles.md §7

ADR-0008 Composite Confidence Scoring Accepted
Replace token-level soft-max confidence with a six-signal weighted composite — test_pass, coverage, adr_compliance, specialist_avg, pragmatic_score, static_analysis — derived from ground-truth tool execution outputs.

.architecture/decisions/ADR-0008-composite-confidence-scoring.md

ADR-0051 Grounded vs Sandbox Signal Collection Accepted
Round 2 introduced grounded signals — running actual tests via subprocess and using ast.parse / py_compile for static analysis. Result: r=0.999 point-biserial correlation between composite confidence and pass/fail.

.architecture/decisions/ADR-0051-grounded-signal-pipeline.md

ADR-0052 Adaptive Convergence (Thompson Sampling) Accepted
Convergence itself is adaptive: Thompson Sampling over historical confidence bins decides when to continue, converge, or early-abort unrecoverable tasks.

.architecture/decisions/ADR-0052-adaptive-convergence-thompson-sampling.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

04

Iterative denoising, not one-shot

Write → Evaluate → Refine. Convergence is declared only when confidence ≥ threshold and no Red Team findings remain. Not when a sampler decides to stop.

Principle 4 · ADRs 0004, 0054, 0058
Principle 4 Diffusion as First-Class Concept
Treat code generation and refinement as iterative denoising rather than one-shot synthesis. The denoising loop (Write → Evaluate → Refine) is the primary abstraction, not a side effect. Memory tiers (working, episodic, semantic) mirror the phases of the loop.

.architecture/principles.md §4

ADR-0004 TAOR Engine Pluggability Accepted
The TAOR engine is a pluggable interface. Any backend that can participate in the tool-calling conversation loop — cloud API, local model, fine-tuned RL checkpoint — can serve as an engine without modifying orchestrator code.

.architecture/decisions/ADR-0004-taor-engine-pluggability.md

ADR-0054 Three-Tier Memory Hierarchy Accepted
Working, episodic, and semantic memory tiers mirror the phases of the denoising loop. Each commit captures structured metadata so any intermediate state is recoverable.

.architecture/decisions/ADR-0054-three-tier-memory-hierarchy.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

05

Structural agent independence

The agent that writes code cannot approve it. The Evaluator never sees the Write Agent's chain-of-thought. The Red Team is structurally incentivized to disagree.

Principle 6 · ADR-0009
Principle 6 Structural Agent Independence
Write Agent and Evaluator Agent must be independent implementations with different base models or prompting strategies. Never allow self-vibe-coding where an agent evaluates its own output without critical questioning.

.architecture/principles.md §6

ADR-0009 Anti-Vibe-Coding Guardrails Accepted
The Write Agent and Evaluator Agent are separate model instances with separate system prompts and no shared chain-of-thought. The Evaluator receives only the committed code artifact (via Git) — never the Write Agent's reasoning, intermediate attempts, or self-assessments.

.architecture/decisions/ADR-0009-anti-vibe-coding-guardrails.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

06

Auditable by default

Every denoising step is a Git commit. Every agent call is instrumented. If an action cannot be observed, it cannot be trusted to have happened.

Principles 10, 11 · ADRs 0006, 0031, 0033
Principle 10 Git as Audit Trail
Every denoising step, decision point, and code generation iteration produces a commit with structured metadata. The git history is the immutable record of how the system arrived at its current state.

.architecture/principles.md §10

Principle 11 Observable Operations
Every agent call, tool invocation, and signal measurement is instrumented. No silent LLM calls, no untelemetered tool use, no out-of-band execution paths. If an action cannot be observed, it cannot be trusted to have happened.

.architecture/principles.md §11

ADR-0006 Git Checkpoint and Audit Trail Accepted
Every denoising step produces a Git commit with structured metadata embedded in the commit message body — formatted as JSON for automated parsing. Together with the audit event log, this forms a two-channel record supporting rollback, forensic analysis, and enterprise compliance.

.architecture/decisions/ADR-0006-git-checkpoint-audit-trail.md

ADR-0031 Observability and Monitoring Accepted
Every component (Rust, Python, C#) emits structured metrics, traces, and logs across the full stack. Cost, latency, provider health, and learning-curve signals are first-class telemetry — the same feedback that drives confidence drives operational decisions.

.architecture/decisions/ADR-0031-observability-and-monitoring.md

ADR-0033 Audit Logging Architecture Accepted
Structured audit events capture user actions, role changes, task submissions, and provider selections in an append-only log. Combined with git history, this is the two-channel record enterprise compliance requires.

.architecture/decisions/ADR-0033-AUDIT-LOGGING-ARCHITECTURE.md

Excerpts from the live architecture corpus. Full ADRs are part of the project repository — public on a rolling basis.

What's shipping

What does the 3C mean?

code · compose · compliance

A reframing of the original Kaizen 3C method (Concern · Cause · Countermeasure) — born on Toyota factory floors as a structured problem-solving discipline — for the software industry.

Code
The legacy artifact you need to evolve. What's in front of you.
Compose
ADR-driven decompose, then recompose. How you understand the system before changing it.
Compliance
Auditable, governable, ratifiable change. Not a checkbox bolted on after — the emergent property of doing the architecture right.